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iBlrjd&Gvmasb isiLpd, Q^rri—rdShu^i. 
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(SUfTSST/O 6I//0 &S)PQU U/r)d)l dfols^l 3,JUfT)jd 
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2. Networking basics: 

sii<o 5 ) svu iSlsir<oST sv3i sir ujnrfyuj 
<3]isi-ijuG$)i_g,6rT, IP (Lp&surfl, Routing, 
Firewall, DNS, VPN QurrssrfDsufDs^fDu 
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http://www.linuxhomenetworking.com/ 
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/Jp/7SU/T<S5<95U> QgfflrSgl Q&MSfTGfT 
Geuemui-UJSfi tgieijffiujLD. " rflrjGvndaLb 

<5T&Srd(3) SUIjn g/" STSSilLD <oTSmSSSTL£> 

£g)( njr5g,rTGV, Python- sSlq^rB^] 

surd(gjthis,err. / j 5 )< s 61 //_o srsrrhu 
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2-3,6 yzi). 

https://pymbook.readthedocs.io/en/ 

latest/ 

4. SQL/NoSQL/JSON 

l5)<SWS1/0Z_D ■*/_!_ /_<S5)/_£)/_)/_/ 6)S]sST6Usb 
QLD/rr^)<a<o5)SY7'/j(SQL) urofil^ 

6V)6U3i 3}l 3> 3> lo)I 

MySQL 

NoSQL with Redis,MongoDB 

http://freetamilebooks.com/ebooks/ 

learn-mysql-in-tamil/ 
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5. Visualizations 

3,rj6)jg;®n6nd aismsmrrsv urrrfppj Ljtjlpp] 

Q&,rrsirsu{££[)(&, ty&jsurrm suss)fjumthJ3,ss)sn 
s_0si//7<®0(si/^/ ujdiQ prflp^i smsupg^d 
Q&rTsrT^rhi&sn. ^)^ff5Q<s<s37Matplotlib 
mrorrym Kibana ^Shus^su ssnsnssr. 


6. Cloud services 

Bigdata g,qps)S]g,sn^ss)ssrpss)piLjm, pm 
mL^daiSttflsisfl ^LpevQm arorQjd Qarrmm 

(LpLij-iLjm. ^pjdQ&skt (ffjSm/DrBgjum&m 




4GB qjj3,<sv 8GB sus^/juSlevrrm RAM 
(SpevGu. G>ldg£]ld AWS, Digital ocean, 

Google cloud platform, Azure 
^ShuGujDG^jDU uju(Ql3) Q^rjIrbS}] «n su^^jd 

Gl3,rrsn^fT,thJ3,Gn. ^ssirrsb Gj^rrGuQ^rrqr) 

cloud service provider- oSlq^rb^i ajnrrud 
Qaimshisu^] G£)<s£t<s$)ilc) %uu/7<s ^(SOLotr/ro. 

adjflg)] Q&svgij ^ij&STn<soiii> 
uijGurTuSlsvG^KSV. l]§>iuj virtual 3,q^G)S]3,GS)Gn 
2_06i/(r<®£i), TB <3iGtTG)S)Gur[6$T &)[jGyg,6$)6n 
&<souum& process Q&luili £g)<gj<g«o<95Uj 
cloud services ^siy/i. 

7. The Big Data tools 

Hadoop, Spark, Pig, Hive, Scikit Learn, 
Tensor Flow (SurrmroGuroG^tDij 
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Sl5)0Z_D/_//_£) 2_6U<$<5£i/<950 [BLD<a5)LD 
Q&6VGVIL£i. 
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https://www.kaggle.com/ 
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usheril, aisv^/nfluSlQsvQuj 3,6md(3) (LpL^rB^j 
(oTsirru [smb rffs^sm^^rr^jm, 

&6md(&) [513)65)13 6ld(5)6U^l6V65)6V. 

LjsrTsfiluSlujGsxMij urbfSl & ; 65T&S)ii3 3pru 

Q&>Gfil6urra,d ajorrud Gl336n^3,di36n. 

^j^j(Ssu algorithms OTsi/si/mpy 

2_([F)6Urrdg,lju(d)QG$T(DG5T 6T65TU65)3,U 

L-lffl[53>] Gl&msrrm 2-3,611 ld. 

9. Community Contributions 

13(0(0611^3(^3,3(3) 6£(Vj 6)S)6)%UJfE>6$)3,Ll 

UfDflSld 3(f)^jd G) 3i 3 ($) d (3) idGiu 33^3,3651, 
[5313) <316$)3,lj UfDffl (g)65T65)IL3) ^LpL3333 

3(f)^jd Gl336nQ(S(D3Lb. 6T65 tG) 6U f§fhl36YT 
3iJD3jd G)3>3655TL _ 65) 3, LI U/DI^J ££) 65T(LpL3 6£(3) 

u$5i6ii 6T(Lp$5i Gl6U6rrlu51(5)rhJ3i6rT. 



^nrjhSleviefrerr (&>(Lpd&6i?l<sv 
Q^rfrB^j 3,(D^]dGl3,rrsn(^rr,thJ3,sn. 

Meetup.com £g)<$/D(§ 2_<5<si//i). 


^js^xsuGhusvsvmi) urTtjuug,in(& ) 
LDGSxsouurra, & ; cjhiK3in<soiil>. 

s^suGlsurrmrorrs, G)&iliuj&> Q^m_rhJ(^rhJ3isrT. 

<3IQ] fifrjgisrT q 5 ) 0 /_ dl//_d 2_iurT^^lfD(^ 

Q— fH]<oS) (oYT <31 toff) Lp&i &tl <9~ G)&60 6VI LC). 

£D<Rn<5 lSssst^ld LEsmQLD /£)<s3i<s3rsiy u®£Sild 
G) &> FT 6fT (61 7) fH] Si 61T. 

$}6ST6S)JDUJ f£l<SS)6miULD @Lfi 2_6U©fiU, Q-l£ia(§fl)&(& ) 
epQtj efilayujLb Q^ffliusSlsbeinso snssrjormb, 



arrfjemiL ,r$rii&err Qgfflrsgji Qarrerrerr 

(Lpiunj&l Q&iuiusfihovesxsv.. 



Machine Learning - Online Free Course 
- Andrew NG 


https://www.coursera.org/learn/ 

machine-learning 


£D/ 5<5 ^Gtnsmuj suLflu unri—Lb, uSasiyti 
uiumimmfhi. £D<sn<5<s <ssu/d si5]/_/T^/7<95syr. 





6. <SHtl5l(Lp&L£> 


^jiurB^lrrsuL^d arorosb sr situ ay a,roGunay 
<s >1 Sit awns, GumrfrB&ti suqjj&lmrD s£® aysnjr). s£® 
<g5<53sf?s3fl<®0 ajDLSluugj, tSur&Gy 

Li&iGQsuay, h&lLi—UulLi— tSUrSlGftsir 
^l^UUSS)l—USlsb 3iSwf}<SSt]3iSS)<SfTGuj (tpt^siSsrxsw 

GtLD/DQarTGrrignjLDngi] Qatusuay Gurrmro usvGsugy 
siSls^ujrdastnsrr ^jiutB^lfjrsuL^d arotosSleb 

arrsmsvrrLD. Lussfla,sir Qaiudlsir/D Gsusfnsvsfmu 
Qsuqild rfftrsvam <oT(L£>S>I asttflstsflsmua 
Q&ujujG<s)Gmju&>m Quiurf ^jiurB^IrrsuL^d ajDjD<sv 

^array. Quiuij pmstfhudaLD 

(Automation). wsfsfl^snsfrru Gurrsitfryi 
asfpsflsfsflasfsvsrr Giurrlalda GtnGugay, 

(LpuyGyastn mu-jib ^sya,stnm Gtsxsu^Ga, sr(^da 



Gtnsijuugj, ^sijGurrpj srQdauuQLb (LpL^si^am 
^jiurB^lrT^mLcrrsi ^svsvrnrxsv ^rSlaSlm 
^Lq-uustni—uSlsv ^g^ildsu^jd(^ srmQmssrm 

Q&iliuj (Seusmt^LD, ^sug u/T/ry (Hiurrdlda 

GtnGlIUUgl loTGijGUITgU &rT£§)lUJUU Z_!_/_0/, 

^dlsvimm «uLfi(LpG3)jr){F,m srmm, QarTLLunQansb 
srmQmmm Grsirugj (Surrsiri u ^saisOTjSsaijSuyLD 

®S)md(& ) Gi]G&, ^uj^^irjsui^d aropsv ^ 0 ld . 

^GUfDGtn/DQlLKSVSVrTLb Q&lLlGlJ£bir)(§ fflsu/p/LD 
<5<956U6U Q^rTL^hsVJ^ILiu ^gi/fl^CSsu/T® 

LDL-Q\LDsbsoru£>sb, asttfl^LD, Ljms^uSlujsb Qurrmro 
LDfOfD atiG^ipsus/ihsviLb fflrfi)§i ^isf-uustni— 
^rSl®S)GS)G$r snsnij^gjd Qarrmm (Ssusmt^LD. 
<smj(3un& ) i&>rTm rsLDLDmsv &<svuiDrra aststflssfld^d 

appjd QarrQdai (Lpu^-upd. (Hldg^ILd LDirg^jiD 
(GjLprflGtriisvatQrijd^Lb, ^rjsijaa^d^LD sjpu pLDgj 

<3<977<5<S31 (SOT (LpiS\_Glj&>G$) GnU-jib, 35<S3sf? U L/35<ST> GUIL/ ID 
LDirprSl GULf>rh](8 ) Gu(3&, ^uj^^irjsni^d appsSlm 

dlpuL/u usmi-i ^(s^ld. <^gs)&>Guj Adaptivity 



srsiriQ] an-guGurj. tSTrf,Qg>r5&> snss)3>iunm 
(^Lfirfisvisvam ^lursSdiTGiJLfild ajnp<svid(&, gul^I 
su^ddlsirpm <oTmu<s5)&>d SGrp amsmGvrTLb. 

LDssfl&,®niGtni—iLi ^GtryuGu ^rSlsurrsb 
Q^iiiiud^ipiu Q&iu<5vam: s 'urramib spLlQsugi, 
s^0su0<sn/_uj 0/jr<s3isu<® (3 <sl_Gi_ ^snsn 
«< sot fluugi < 3urrmps<s)GijQujsbGomb 6p(p<surf 

<gj<SOTgnj<ST>/_lU ^GVIUGII ^JT ! )lGlJrr<SVILD. L/<5^) 

&rT& ) ]ijuj£g > nfGg]Lb Q&iuiud ^ipius^iGLi. 
^Gupp]dQa<sv<svrTLD (UrBrnspuna r£l[T<svam 6TQ-gd>i 

asstsflssfld^ Q&rrGbeSl£&,rj (Lppujna)]. <snt5&> 
^GpiuGuds^ipiLiLb ^rfilGtnGLiiLiLb G)&rr($\£{F ) id,& > nm 
rsmi) aGvsfl<5tsfl6muu uLpda (Ssusm^LD. (Sldgvild 
ST^& jGtnp &n[jrb&> GifiGqiuJim&Gsxstr rsmb 

SiGVsflstsfldt&fU L-I&lLl _ 6l5)0LOL/£)G/r)/r(3LO/7, 

Gusbegirsijaim ^Lpsctb 

asvsflstsfldt&fU (Sun^liu ^/rJsiflsrxKsr suLpdia 

(Ssu<sw(J)ld. ^)<st><s(Suj domain expertise srsirpi 


a^pjeurr. 



LDGts?l&> &dtdlsmu ifitfil Q&iliuj (Ssusmi^iu 

Q&iu<5vam: stSlsmQsusufl, L^Garrsmb, ^rStsiSliusb 
(Surrsirp aisypasiflGViLD usbQsup] 

Q&rrps&m&sb rBi—ppuuQdlsirpstsr. ^snsn 

Qsupjfilsmup &>(Lgsu (SsusmQQLDsirprTsv, 
ppQarmQsn (UprrsvsSlsmup pQgsSluj (Lprsstnpuj 

(S<?/7<5<531(537 (tpt<J_6iy<S<531<5V7 CT®<5d£/ ^{7/7 ILirBgl 

QsuprSldarrrm .5 <535# lilS <531<537<5 &*./d (Ssusm(^)Lb. 
s-prrrjsmpgjd^ ld 0 < 5 ^/su<s §iss)psisnu 

srQp&rfdQairrstttri—rreb, LOliJ&sijptdl sirCS urrgi 

QusmasTflsir £g) pup <si 5 )®< 5 ld srssrugj urrtdld^u 

urrSdiunra £§) <5 <531<537<5 (&)SS)puu&>p(3) 

^{fjGusmrj i3rj&si]£$£isir < 3un{f ) i ^jprsp 
usoQairus\.daismdairrm Qusmas/rlsir 

^lUGurSldstnaasinm srQidai (SsusmQLb. 
^suppjsh speuQsunQTj Qusm smn i) srpmrrsb 

^jprB^l(§dd}prrm, CT<5<5<531<537 Qusmasrr spGrj 

SU<S31<5UJf7<537<5/7/7<5337<5<5/7<SU ^jpdSlptTlj-J,Stl, 

srppsmm susmanurrm anrrjsmrmaisn ^puLfd^ 
su 7 ^)su 0 < 5 ®< 537 /d <537 srsirugi (Sunrsirp 



s>Sl s^i iu di a msrrQuj si) svrni asm® i3iq.da 
(Ssusm®LD. ^QpsvGVmi) ssmsmDifiKSsvGiiLi 

LDssfla, ad$d)d@ ^uunjou lLl _ Qaujsba.rrssr. 

^a(Ssu ^]sujD<mpd Qaiusup /rjgj asvsflssflastnsmj 

ULpddl, spQrj infr^lfitumssr pattern-si) <^qjjd(3)Lb 
a,rjs^ass)snd asm®i3i^d®ssrp)sisT^. i3sZrs$rij 
cgysu/D&n/D sr®£§i LDQjj^gjsu svsvsvirbriasn 

ufldsSJ^^I "SUL^)sU0<950LD 

anrjstssflasb" srssr (tpuj_siy Qauj®ssrpmrj. 

^&><m ^i^-uustni—uSlsb^rrm pjoCHungj 
affiSlsffifhurra su 0 ®®sra) Qusmas/rli—Lb ^surorSleb 
sjQassv® spsirgu Qa,s$iuL-i—rrsb 6 fo.L_ a_/_(S<53f 
^/pysnsu dl£)ds$)& Qaujaj s;S]®®mp)ssTij. 
^aQsua.rrm a.roQuna,] urr^id^u urrdi 
Qusmatgnjd(§ ^/p/snsu dldldstna 

ffljiuiuijuLL rrsvi ld. £g) idul/ si5)®<sld GrsZruaj 
(LpQgsuglLimad gjsai/D [B^l si5)l1l_^/. 

^)uj/5^)/7/5j<95(S)5<950<95 ajDLSluugi srmua,] 

"sfilsvrhi(§asrr STGusurrgi] ajD&GSTjDsvr" srsiruss)#. 



= gyuj_Lj/_/<sy>/_tuf7« <st>6u<sG><5 ^rjrrujuuLL 

lSsotsiv0ld £g)0 Qu(§ld (H&rTLLurrQ&m 

(LpdQiuu urhi(§ euQdSlsirjDs^T. 

Bait's shyness: pidsmis 3 ud(^LD 3 gy ^)0<®0ld 

ssmsiftsmmd asm® ^s>ji^33,sb srmuGg, ^3,esr 

Qu3(§m. <31£rTGiJ3)j aSlstyLD 3,1 —SuuulLi— ^mrreb 
U3fjUU3,31(3)3 3SUud3n.lSj.lU 5U3D3uSl<5V £|)0<S0LD 
*<5DSi/a^L./_u ulLi— ssmsij3ss)<snd 3sm® 
<oT<s6l3Grr ^3j33LD G)33<sbi^LD. STSStQsiJ ^SU/DSTIJD 
(Lp(Lg3,rr3 a_LlGi3,ndi«iJ3 > fO(3 ) (Lpssrssrrj, ssmsdlsxr 
sp(fj) dlrry u(3,Gi13shu 5tQ\^3,i (ip^<s61<sv 33 ui3Q\ld. 
<SH3,mrTsb 3>md(3) luirQ^rrinj l/^SDul/ld 
^SVSVKSvQlUStiflsV 3-Smsm<SV3LD toTsirgllLD, utdlUL/ 
(S[5U[53>3<sv 3-smsm (Ssiismi—rrid >17351 try ld 

(tptjj.(o)sn(5)<®0LD. i3mmu i£ssst(^\Lb ^(^ 3 , 

Qurrmro s^0 ssmsSlsmsxr LDgyiysstnjD 

33 Smi ld(Su33,l. 3,3m <c7TJD33Sr(SsiJ 
(%333>GtS)m (LpiSj-Glj36fi}m LSj.U U3S)l—Ud<5V 

S-Smsm<SV3LC>3 (SsiJSmi—3LD3 <57357 



(LppQ£U®d(§Lb. ^)^/Gsu ^lUrB^lpSUL^ld 
arpmsSlsvitb rBsmi—Qupjdlppi. 

LDrrQuQijLL prjsijansdsSlqprBpi spqp dguu(§Sil6mu 
gtQP&j &Gvsfl6vsfhum5$r§i rjppsSlsb ^prrpLD. 

£|)dy(3si/ sampling tsTGzruuQLb. ^tddpi u@di) 
^/Tsiy<*syV training data <orsirpj ^stnLpdauuQLb. 
<9ipg,[jGi)3>G$)m ssmsmevrub (Ssusmi—mb simup] 

(Sunrsirgu susmauu^ppisuCSp labeling 

toTGmju®LD. <^LD(LppGyaiGS)6rr Gtsxsup&il 6U0®<S377D 

li^Iuj ppGij&Gisxsrrd asvsfluu&j Predicting the 

future data srssrijuQ\Lb. (^jpiGu rrsiipj u&vGeugy 
uprm&m ^]iurB^I[ 7 siJLfld arppeSleb 

u ru u (dppu u ($ dmpssi. ppmnsb (^jpiGurrmp 
{BrjLDmstsnEiam d<sv&LDUJLb psuprra LDfr/r5)si5)/_siyLD 

sumuupsrrmpi. (^jpfdrp «>£0 dip rip s^pmjsmrujrra, 
i3msn(pLb (Hamlumlstni—d a^peomb. 



Pigeon's superstition: Liforrd^stTisii ^eurorrsm 

asvsfluLi STmir)] rsmb ^)<s3i<5<® a^pscmb. <sp(fjj(Lp<sm/D 
B.F.Skinner sjssuld Li>Gmn su oShu av rroarj 

z_//D/735<95<sy>6)T <sDsn<5^/ ^uisiy spstrsm/o pi—£$£imrrrj. 
cgi/ls leu usv Liprrdayssxsrr spmpnai ^smuj./B^'sh 

<5316 'U&&J , <^<53161/ 66 ( 6)3 <950 (§Jt51ul3lLL- &rTm 

^)<5D/_(o)snsyi)66<5yi)su ssmsy Q&szTgu (H&qjjLDrrv)] 
£rTGtsfhurh]&>) <spmss)p) ^GmLD^&.nij. ^gy^/siy ld 
<97$ 10/735 Q&UJGVUlI® SpsijQsiJIT(§ (LpSTI/DlL/LD 

ssmsu<s/rl£s^i surs^^i. L/ronda err 66 63 / ■* 0 

SpGuQGUrT(fJj(Lp6r>lBlLILi) ssmsij tSTULUS). SU0ffl/D^/ 

srmu6$)&>d asm® l 3iq-da, ^rsQrBrr^^lsv <s>i§l 
srmm Gl&iugiQ a rrsdsi 0 d,&, 0 / ctsotl/ssijS 

ansnesfldai^ Qpm—iEi&hugi. ^pnsugj spqjj L-ijorr 
< 5/7537 &,G$)eoujGS)3 : d(& ) Lb < 3urTQ&>6b60nLb ssmsy 

GUQffGug.nsb, prnsir pstsvsvujGtn&Ljupmsvpmsir 
<5637660 a_<533rsiy su ld, LDrDQ/Drrqj) 
t-UD rr^t§! 

(3)fi>i£& ) ldQ3irTG$isTU\-(n)d((f ) Lb < 3LmQ&>ebevrrLb ssmey 



6 U 06 u< 5 / 7 <su ^i^smrrsv^mssr a_<s 3 srsiy 
SU 0 ®/D(fl)^<SW/p/LD /#<ST> 63700/<95 (o)< 95 fT 6357 (_ 0 /.. 


^mrreb ^svsuuSliJsmQLb g.roQfFUJGorrai^ 

Glpm—rfumstr spdrGSfD ! (temporal correlation) . 

ssmss)LDuSlsb urrrj^rrsv STf5&>6pqj) &LDLDr5&>(LpLb 

S\s<nt—UjrT§i. ^mrrsb qro^Gsurr 
^GusfilrjGmQd^LL ^sujorr&sr spqj) Q^rri—iji^sisxssr 

2 -smi—rrdd), <sn&,miq-uuG<s)i—uSlsb 3 >sis!s?lui 3 ss)m 

si5)(5)ffl/D0/. $5lu°Glrje$T^rresfliudiSl 
Q&iugvuQld (S[B[7Lb LDrrjDjDuuLL (J) a_<swsiy sn/70 
(o)<5f7/_/Bj®uj0/. ^mnsb ^stn^iudHiurru Li/Drrdam 
&>G$)svujGts) 3 : £& ) iu urrrjdj^iLD, (if ) §)'id > iFjLj urrrjdj^iLD 


a_<s3srsiy GurrrT&,&>rr<sb, ct<st>/_ (^ss)T>iud, 

QprTi—rhidhu&tj. ssmey GUQjjdlGtrjD < 3 / 5/706310 


pfliurrad &Gtsd)dairT&>(3&, ^)<s/D0<® <95/7/7635770. 
^Gi]GUnrgl]Lb 0700/ £|)177 [jmi3it§n ) d((F ) 

/B/_/50/si5)/_<*<9ii.La0/. &,rf)Q3 : iU3tiiT3i rBGni—QuguLD 
(o)<5/tz_/7l/<st>/_uj /^aipsiyasyfisar /#<957p0<956i/ 

egl/£5)<957D/7<95 £§)000/7®i> c?>/0/(5617 0700/ 



^)uj/5^)/7/5j<*syr /bld<®0 ^s/rld(^Lb asvsfluurra 
£g) Q5^§1sfih— d &n.i—rr§i. 


STsSluSlSS)ISI5)l—UJ a_<5/T/T<5W<5<S31<5(Suj LEsm®Lb 
unij^^msv #fTiji3LLi—Gyi—m, spq^ uShm^rrijd 

aabi3uSlsb ^l^ulL® urr^luusiniSlpQ^eisflisv, 
^Gus 6 )lJ 6 m®d(§Lb st[ 5 &> Q^m-pL/Lb 
^sbsnscQujmu^i '5TGSld(& ) £ Q^^U-jib. 
^gyGiurrsirgu tsj®Shbili—Lb amsmuuQdUssTiD sp(§ 
c gl/tiJ.L}LJ<SD/_UJf7<5OT c gl/rt5)<5DSu(Suj /g/TLO a6tfsfl<5tsfld(§U 

li&lLi _ Ggu<s m@Lb. ^syGsu Inductive bios 

CTsw/p i <3)6tnLpdauu®&>)fD3)i. Biosed tormrofTsb 

U [T[JLJlL< 9-LD uxijuu^l, SpQJ) ^GtSVSV U U L-b&LDITa 

<g£l(rj)UU 3 )i 'oTmiQ] Qunq^sb. Inductive bios 

toTGSTfDrrev ^jiurB^li 7<s &>ssrLDnssr (Lpuysyasinm 
uuuyQuj sjrbjQidQaimsbsnrTLDsb ^rSl«S]m 
^i^uusni—uSlsb urrrjULL&u uQpfdlu urrrfuugj 

trrssiiry Qurrqjjsb. ^j^iQumssrp ^iSls^ssim 
a 6 vsfl< 5 tsfld(§ ^ 3 ]sifluuS)fb 0 ^^gystnio ^rrijrBS, 



susvsgirsrjasrr (Spmsu. sn/7<s (3 svr domain expert 

STSOT/Ty ^mLpdauUt^Slsirpmrj. 


^jiurB^lrx suLfldarDjDmsvu i3ssrsuq^rb 4 
sfilpriias/rlsv i3fldascmi. 

1. Supervised vs Unsupervised Learning: 90 
aststfluL/ /BSDi_(o)Lj/p/sn<5/D0 ssrrsif® srssrmsurra 
^)0<®<g5 (Ssusmt^LD ; Qsus/rlud® srmmsurra 

^qjjda 5 (SsusmQLD ; ^susfilrjsmmi—iLiLb 
STrsQ&trB&t sfiltdlas/rlmuis)- ^msmda (SsusmQLD 
(Hurrm/D <3jmm£m£iLiLb QarrsbsSldQarrQ^ajd 
asmarrsstsfluu§] supervised/structured learning 
srmuuQLb. 2-&>rrrjsm£a)]d(& ) rsmi amistftdld^d 

Q&mtQ] Qsusmmi—damu summ(& ) su& > rD(& ) (Lpm, 
<s>l£m rfljDpmpu urrppgj siisurrQrr) 
^u).j^isrsFlmuj (Ssvarrad dlshsuflu urrrjuCSumi). 
^aj uamLDiurra, LSlq^aj surra £§)0/5;5/tsu 
surrrhiasvrrQLDmgijLb, ai^miurra u(±g>uurra 



$l(I 5 P& IT <sv (Ssusmi—irQLDssrg^Lb (tptij_siy 

Q&iiiGisijmb. 

• spqj) Glsusmssu—darTsmu sunriiaiscnLcn 

(JisuSSSU _ ITLDIT STSST (tptiJ_SlJ fflffliiliJLD 

anrjsssflasmrsxr r§fo ld &,ms$)LD 

(Suitsst/dssisli domain set srssruu^ub. 
£|)sir>su(3iu X srgrriLD a_syfsyfiJ.tij.su 
arrsmuuQib. 

• sun rm a scrub, (Ssusmi—iub sissy ld 

ld$ i_ti_/<ssyr labels sTssr/ossiLpdauu^Lb. 
£|)sir>su(3iu Y sissy ld Qsusifl itftJt<j_su 

^SSUDIL/Lb. 

• 90 mapping function -^sxrgj 
a_syrsyfiJ.tij.su a_syrsyr LD^LtL/sissrsyriijLD, 
Qlsu syf itfiJ.tij.su a_syfsyr tD^LtL/asaysyrujLD 

(fl)<5/n_/jfl_/ Q&lLULjLD. Q LD SSLSS) LD -> 

sn mil « scrub, aLSj-sxnb -> (Ssusmi—iub, 
u&tSshd -> sumiiascmb, i_i(t£i_ti_/ -> 



(Hsusmi—mb. su rules set 
GTmuu($\Lb. f: X -> Y 

• Rules set <s/dl5)<5<5 

^i^.uuss)i—uSlsb a/DgudQamsnGijgi 
learner <ormuu($\Lb. 

• Learner ajDrrydQarrsmi— s^ls^iuraisis/rlsir 

^SHLSj-ljU65)L _ uSlsV L/^)<5/7<95 6U0®&S77D 

a_sy/syfij!*(syj d 0 Qlsueiflid (J) <nmmsurrai 
^)0<®0ld ct<537 Q^iusu^i predictor 

srmuuQLb. ^fhnsu&ij i_i$£rra ldjdQjdit(§ 
Qsusmstni—darrGmuu urrijd^LbQufT^i, 
^£E>ss)m lEissstC^Uj Q&n&,G$)m QsFiuiud, 

(3&,GtS)GmiSlebG$)G0. £|)/ 5<5 G&rTgSSXSlST 
(Lp up siy <95 siifl sir ^ 14- lj u <st> l_ u 5 KS su <3 iu 
GurTfhi&GorTLb, (Ssusmi—mi <nm 
(LpupQsuQdaiGtirTLb. 


£g) &>g$) m classification LD/D/p /ld regression sr&sr/ri/ 
2 6i5)<5LDf7«(j iSJjfldaiGvnLD. Classification-®i> 



ld^Iuli sjQprr GUGn&uSlm Stp ^s^nDiL/LD. 
Gleusmstni—darriLi a_<s/7/7‘<sw<S^)su snrrrdaisomb 
(Hsusmi—mb srgn/LD g^rrem (J) snss)3>uSlm §>l£ 
^sis)LDSiJss)£f, $}£bjD( 8 ) S-prTrrsmLDrrad 

Qaimsbsnsomb. Regression-sir ld^Iu l/ s^0 

ssmss)LDiurrm (Lp(L£ LDtdluurra £|)0<®0ld. 

su uSl/nffilsvimsrr (^Lfirbstn^smu scan Q&iugj 
^umurB^i tSy&j LSlpd(^Lb(Surr^i STsi/susrrsi/ srs3iz_ 
^)0<*0LD CT®Sr/_/<SD<5 kg-si) < 56 ( 5 TStfl^&jd a^rrtfGUGS)#, 
^)<5/r)0 a_*5 rrrj&sisiLDnad QarrmmisvnLb. 

Unsupervised Learning - si) Qsui^jld 
ssnsrfL-Qdainm ro £5)/_}(_/<* sir ldlIQGld 
airrsmuu^LD. Qsns^uSidQidairTm ld^Iul/ 

srmmsurrai £|) 0<*0(o)rD sirCS©/r, 
STsi/SlS^aSyflsOTL/tiJ. ^S^ILDlL/QLDsirCSpfT ST/5<5S£0 

si 7 S 3 i/ 7 (tp®n/D<prD dlGtni—iurrgj. QsutryLb 
2 -<sb< 5 rfLL($\d 3 inm ld £5) rjL/<ssnsyr ldlL®(J>ld 

^rrmurBg], ^&J<oSl([F,rB§i s^0 pattern-«p<s 
&G$$r($\i 3 iq-£{f ) i QlsusiflidLif^damssr 



asvsflUurra [BLDd(^ QsiJSifluuQpgjLi). ^g,ss)m 

clustering ld® t^ld association otct/lo 

6i5)<SL£>f7<s/j LSlfldaGvmi). <surru\.dss)3>ujrT<snijai<s^m 
Gtfl(rjjuu£$zljD(§ !Sjf6fofTijQumsv 
®S)rDUG<s)mujn&\m!D G)urT(njL-&>G$)<strd 
aisssiQ\i3i^^§,] Guss)3iuu($\d,a)iGiJG5)& > clustering- 

650 2_<5/777<sOTrLDf7<s<s QarrmmsvmJ). 

siSrtjusDssrujfrfflsOT/D <s/rsiy66<sDsrr ldl.(J)(Sld 
a_syrsyf/_/7<95 srQ\^^idQairTsm(^\, 

(Sun'ddlCSsvCSiu G)&mgy ®S)ri)UGS)muSlm 

(HunrddUstnstsrd (sales pattern) aissst^i^i^d^Lb. 

^(^d,§! ^Gusungu <*<s3sr(5)i_St^.<®<95L} 
sfilGurjrhiaiGtsxsrT srQi^gjdQainsssrQ}, 

LDrT$dlrfliunm (Sgu/qj 'STgQg.gg, QurrqjjLLansfilm 
LSQ^svsvmb surngdss)3yUjri<snijaii^d(^ s^q^uuLD 

(S<5/7(S3r/ri/LD OT&S766 65<53sf? UU6S)g> aSSOCiation-650 

3-&,[rrjGts$n£>rT3id Q 65 msrrmsvmb. ^j^sir g tpsvLb <=$£&> 
aS)(njuuih]3i6fi}m dLp ^s^ldil/iJ) usvCSsugg 
QuiTQTjLlaG/rlsvr aS]g)uss)ms<s)UJ rsmb 

tSytdlafldaGVrTLb. £|)0/<3su 



unsupervised/unstructured learning 

<ormuu®Lb. 


Structured ld/d/p/ld Unstructured £§)®r>6L/ 

^)/T®SSr(5)<®0LD ^)<SDi_U®SU ^S^ILDSU^I Semi- 

structured learning <ot<5^tuu(^ld. ugi 
fPQjjUdsv <5/jrsiy<95srr label Q&ilhuuulLQld, 

LDfOfQGtftGlJ label Q&lLlUJUUt—nLDGOILb 
ansmuu^LD. ssmstsuDuSlsv rflaLfiamsvptdUsv 
6n0®<s377D <5/rsiy<*srr /bld<s0 ^LbtLpstnjnifilsvpmsir 

^)0<®0LD. L/SU(S<95fniJ.<®<g5<SOTr<®<95/7<S3r <5/7Siy<*(S31S)T 

^umurB^i label Q&iiieugi tzrsirugi ^^Iiuldjdjd^I- 
^sijsunrCSp ^ss)m^ss)^,iLjLb label Q&iLnurrLcxsv 
efilQsugjLb 2-gGungj. (LpdS\ujLDrTmss)su label 

(qI&uj ujLi u lL l _ rr &> (Ssusm(^Lb. ^]^i(Surrmgu label 

Q&UJUJLlULL^Lb Q&lLlUJUUI—mD<5VILb ^](§d(^LD 
&>pGya,GS)6fT rsmb (Hm/oasmi— 

®S)&,Ih]3i<sfilG0ILb ^pniUSVrTLD. LDStsfl&>lLSl(rj)3i 
(Lparziasiflsir asvsfluLi, (gpsvas/rlsir asvsfluLi 



(Sunrsirpmsu ^]ib(Lpmpu51sv^n'siT ^mLDiLiib. 
Structured (yrmpifilsv label 
Q&iLnuuuLLi—GuiBGtniD ldlLQld training data- 
QarrQpgi, ^^miij.uumi—uSlsv 
LD/DjDmsuamsrrd asvsfldasvTnb. Unstructured 

(LpmjDUUlS). label Q&IUILIUUlLL- LDJDgULD 

Q&iLnuuui—rrp ^mm^^isSlq^^^iib sp(§ pattern- 

«»<* <s>16$)&> G5)GU&>&)lLb aStfsfldaGVrTLD. 

2 . Passive vs Active Learning: su^dimjD 

< 5 /rsiy<*<s 3 isyr ^uui^Quj ^riDgudGlamsm®, 

Q < 55 / 7 ®<* &u ulLl- sfilddasfflsiru lt). ^pmurBgi 

a/DgiidQamsrrsijg] passive learning mmuuQiLb. 
spQTj LSlmm^&eb spam-^ ^ebmeviurr srm 
G&n$£lu using $)gYD(3j s_gmjassi ldttsf, d 

G)&>rr <STT<SYT< 0 V mb. ^j^sv mmsuQiusbsomb spam-g><® 
0/i5)<®0ld sunijgsingaisb srsirugi asvsflsvsfld(§d 

aiDLSidauuLiQsfilQLb. srmCSsu 
su(gdlmp s^0 Lfilmmfsij&sv ^ggrnauj 



GjU Glfl(oV <oj(3&> (o&)I LD (5p(oftT(o&)J])8) 

Q&rtGmiq-Qfyrb&nsb spam folder-<s 0 LD, 

£|)SVSSXScQlUSIsflsb inbOX-<*0LD /5<95/7<56yLD. 

{ditg-QrjGtsrspam-d^rfluj ot / 5<5 s ^0 
surTfj^ss)9,smuiLiLb Qairrsmu^fjrrLDsb, ^mrrsb 
&r5G>£a£6)l!D(8)fiuj euLpda£$dlp(§ LDrrprrai sp(§ 
LSlmm^&sb suq^SipQ^ssflsb (anamoly 

detection), gma^i <j/ 5 < 3 < 5 <s< 5 <s 3 i< 5<5 
$5ij&,& ) ldQ3irT<sbi§n > Lb Ouit^l® u<sv(Ssugu 
(Hamsfilastnm ST(Lgui3, ^pparrm 6i5)<sy>/_<s<sy>6vr/j 
L/uj<s37‘/7<95syi)/_u5)0^^/ QuiDirydQainGsisT® 
<SH&,mLq-uu<s<s)i—uSlsb &r)ir)Gtn<sv£ (o)<sm_/5j06y 

active learning <srmiju(^Lb. 

3 . Adversarial Teacher Method: Spam filtering, 
malware detection, biometric recognition 
<3 u nr sirp eurp t jfil Q <sv <si> <sv mb, fil iu ij Qurrmtr}] 

s£06u/7 Q&ujsvulI® Qan($\d3,uuL-($\<sb<srr 



lS/D u U @ LD G U f 7 < 3<5 nl <975 G ><5 ® <95 <95 U U (J) LD <3 U f 7 < 3<5 fT 
loT&tl afl sraj <561 igu tsrmuGS)#, tSTQ^&fiGtnrjuutTij. 

QsugULD 5/761/«(3s)T!7(j! Gafjdjgrf £g) LD(LpSTIJDuSl^lLD 

a6tpsflssfld(§d arr)i3dauuQ\LbGurraj, arrrjsm 
arrfluj (Lps^ipuuLSj. LSlflpgj 
ajdgildQaasrreiJ^jdaam eumiiuLi ^^id(^d 

§>i6&)i _ ddljD^i. 

4. Online vs Batch Learning: /^u5]/_<s^)/r)0 

rfluSh—LD LDngudlsirjD &>rjGi)3,G$xsfrd asmamsttfl^gj 

^a,mi^uus<s)i—uSlsb aifiugi online learning 

CTsweiyiD, surjsvrTfdrrydj a,rjei[ass)sn 
(oTQpgidQansm® ^a,mu\.uus<s)i—uSlsb arouaj 
batch learning OT&sray ld ^snLpdauuQLD. Stock 
broker ass^d^tb online-<®0 

a_<S/T/76OTriDf7<*6iyLD, LDdansb Qa,rrss)a aststfluLi 

r5GV)i— QuguLb 6i5)<s<56t>< 5 batch-<*0 
a_<5/7/76OTrLDf7<*siyLD Qarrebsnsomb. mdash Qa,rrss)a 
asmdGl&QuLSJsv 1970 — 80 , 1981 — 90 , 1991 — 



2000 , 2001 — 10 tormu&j Qurrsirrij usvCSsugy 

u(&)§i3i<sfTrT3,u LSlfldauuLl® spsuQsurtq^ 10 

6U0/_<95^/<®0LD c^uisiy [5G3)1— Qugydl ID §!■ 
<^l&,miq-UUG<S)l—USlGb ^)<S3f)sU0i£> 
6U0/_/E7<S(S)3<®<Sf7<S3r Ludaisb (fl)<S5f7<ST><95 <95<S3sf?LyL/ 

rBGM— Gurry ld. ^)^/Gsu batch processing-<*0 

SdljDrB&t a_<s/7/7®swLD/7<g5 ^ss)LDiLjib. 

£g) iu[ 5 &>lrj euLfidajDjD<s61<sv asbsn usi)(Usury 
QaimburrQiansb ujdi^ilild ^surbrSlmui^. ^ss > ld / 5 <s 
usvCSeugy suL^trpsnpam (algorithmns) ujoj^ilild 
^GsflsuqjjLb <95L.(J)<53i/7<95syfisu amsmsvmb. 



7. Statistical Learning 


L/srrafl si5)su/7/i<95<53is)T<* Qarrsm® apuCUp 

^lursSdp GULfldajD!D<z&GST tSyLSj-uuGm _ct/s^ sp(§ 

asvsflULi ld ^rjGija, surra ^sifldauuQLb L/svrafl 

sfilsuijrEiasiiflsir ^nj.uus^ii—uSKS<sv(Suj ^s^ildSIjd^I- 
£|) <5<5651<95UJ L-I<sb<sfil 6l5)6U/7/5j<S65161T<5 $dljDL£>UI—d 
stnaiurrsm® assists?! d(&)d atoned QarrQ\uu§] 

6TUULS). <oTssTQ] uu(^^lu5hsv arrsmsvmL. ^)^/(Ssu 
Statistical learning model OTswp 

<s>iG$)Lpdauu($\Lb. 

Domain set: ssbsrfi—ua^ <50®657td L/srrsyfi 

stisujmiaGsrr ^jsijsuugu <sys$)Lpdauu($\Lb. 

x={.} Grdrugj domain set / instance space 

OTswuufJlLD. ^dlsvisrrsrr spsuQsuum assr'ida&sr'! 




aS)e.-urj(LpLb domain points / instances gtgvild 

QuiuflSV < 3 )GV)LfidaUU®LD. 


X = »2 . 


s-^rrrjsmggjdig spqjj 1000 uda 
(3r5nLL($)LjLi£5£a£$dl<dr QSlstnsvGmu srsi/sasrrsiy 
g$)<su da guild Sim sp(§ algorithmn ^lpgvld 

aGtfsfluu£iD( 8 ), ^i§igugs)[j rsrTLD GfilstnGV 
rffijemuSl^^iGrrm G/g /tl. (J) u l/ ^5 <s /i <s odsir 
udaimasb S-Gbsift—rra 

^GrfldauuQdlGSTtDGisr. 


X= [10, 50, 150, .... 600, 800] 


Label set: G)GUGfilu$i—rra sup (Ssusmi^uj 
aS}GU[jimaG$)Gfr Quid$(§&(§id. Y = {.} . 

a_srrsyf(_/7d5 ^)0<®ffls577D puGyash GrrsQ^rs^ 
GuetnauSlesr dLp ^gvhdilild gtsvild LD^iunasb 

$}G)Igv ansmuuQLD. ^j^snaiu G^Gurjihiassim^ 





< 5/7 a_<ssiy/_/su/7 'domain expert' srmgy 

<31 <531 Lpd <5 U U (J) SU /7/7. 


^ = yi.V2.y™ 


sshstfi^igsb gmb c gi/syf)<5<5 
(3/5/7L_(/))LVL/<5<5<S/&/<S6) , f)<S3r 61$ <53)611 <56)7 
&rTsmuu®Li>. 


Y = [50, 95, 250, .... 750, 999] 


Mapping function: (Sld/d<5l/)5)uj £|)/ 7 <srar<sT>/_uyLD 

<531617<5#!/<5 (o)<5/75397(5) 3-<sbsffL^Q\d(3)Lb 

(o) 6U 61$ 17$L_(/)) <5 0LD £|)<S31Z_(3ujUJf7<S37 (o)<5/7i_/7631 U 

§iiDu®£b§jLc> G>Gu®nGv®smj mapping function (f) 

Q&tu&l7)3)1. £§)<56316S7 63)6U<5^/<5 <5/7637 /BLD^/ 

algorithm gLb(Lps^ii—iu <5/7517<5531 err/j uri)($l& 
a/Dg)jd Q&msrrSljDgi. 




f : 10 udanmansb -> 50 0 umil ; 50 udarmansb -> 
95 5 / 5 uniii ; 150 udanmansb -> 250 5 / 5 urnii.... 

Probability Distribution: rsmi GlarrQdSldriD 

LDrrGdflp ppsyam upsueorra, ^stsumu (Ssusm(^Lb. 
QsLIgpLD spflrjsm® <5/76iy<S<S316)T ldlIQld 

(o)<sf7@<565/6i5)L_® asvsfluLiam rflaLp^d a^t-ira^. 

a_<S/T/ 7 < 53 sr< 5^/<*0 10 udaab tSuQ&gj 500 
udaab torsviLfilrrsmCRi n£&>ai£GZim oS)s<s)scsmu 

ldl-<$\ld QarrQ^a,! gSIlI®, ^ns.Qpm 1000 uda 5 
LY<S< 5 < 95<5 SDsar siS)<s3isusal uj aststfldad Qa:m5irGtsrmsv, 
<3t[5&d asvsfluLi &,GurDrr3i£ < 5 nm £|) 0 < 950 ld . 
Gpp<srrGijd(& ) a : flujrT <5 £|)0<95<s Qsnsm^LDrrssmsb, 
fB^LD uuSliD&l ^tsufldau uiuszruQasatiLb 
&>pGi]3>6rrnma)i drjrrm (LpstnjnuSlsv upeuscrra 

^stsuniu (Ssusm(^)Lb. ^a^nfsu^i 10, 50, 

150 .... CTS37 dprrm (Lps^ijpuSlsv usvQsugp 



z_/<5<5<s/Ej<s<syi)<s3r udairhian^LD, ^^.roainm 
<Sl5)<S316l1<S(613LD Q& [T® d&U U L_ 

Qsus wstQld. $igi<3su probability distribution 
srmu u®ld. <g£i&,miq-uuG<s)i—uSleb ST<$\daiuu($\Lb 
(LpLSj.(Ssu &fiujrTm& > n3i £|)0<s0ld. 

Sample data: rsi tld (3giridQ&Ctid&,i ^uct/ul/ld 
wrrSilifltB <s/7siy«(Ss)T sample data ^svsv^i training 
data toTmuu®Lb. a_<5 rrrjsMid,§id(aj /bldl/5)/_ld 500 
z_/<5<5<s/57<s<syf)<s3r udairdan^LD, ^gpjarr®sr 
eiSI <531 <sv «(sijld ^(f^ddlpQ^ssflsv, cgysnsu 
CS7<S31<537<5<531<5IPLD (fl)<5/7@<5^//j L_/Lp<S<S/7LD6U, 0-50 

udanmansb Qarnsmi— l/<5<5<5<5^)siS)0/5^/ 

6£0 L/<5<5<5<5^)(537 <S157<531<SU LDJDIQ] LD 50 - 100 
udanm&sb Qarnsmi— l/<5<5<5<5^)siS)0/b^/ ldidQiditqi > 

L/<5<5<5<5^)(537 <S157<S31<SU 6763/L/0/ <3 U FF,fTLD 

(S<5/7^(fl)^@65^/ cS7OTjyLj/-/LD LD/7^)/D<5 <5/7<Siy<s3<Sy7 

Sample data stsstuu^ld. 



Learner: north Qs,rfdQs,fh)dgi ^gn/urLtr/sr/syr 
i£>rrSilffl<£ <s/7siy<95S)fl<55r ^iguusnt—uSlsb 
L/<s<s<g5/E/<95syfl®sr si5)®n<su<s5)UJ rffijsmuSlLju^i utoj^hu 
^ftSl&s)su rBLD&j algorithm- Gusrrrjg&gd 
Qamsrr&)fD&il. cgysi/si mry &rDg)jd Qamsmi— 
algorithm- ^t,ssr§i learner <srmgy 
<SHGS)Lpdauu®§>lfD8)]. (A(S) = Algorithm of 
sample data) 

A (S) 


S = XxY 


S = (*i,j/i),(«s.Va)"- 



Predictor: Learner susrrrfgagd Qarnsmi— 

tSHrSlaflm (tpsvih, label Q&iuujuui—rT£ ig^liu dlsv 
L/sfnsyf) shlGiirjihiam sir 0 ld (3 z_/rrg/ 
^sug)ss)n)Qujsbso[uh label-sir Srp 




cgy&nLD<S< 95 <SU mb term asvsfluugj Predictor 

<ot®stuu®ld. g)^/<3si/hypothesis / classifier 

CTgn/LO usbQsuiQ] Quiujansuflsv 

^<5$)Lp&g,uu($)&i6ST[D6m. (h = hypothesis). 

^prrGugi <s>l§ilauiJ-&LDrTa 800 ud&tmansb 
Glamsmi— n£&>ai£Gbim siS)®nsu susai/r ldlLQld 

<5/7w /bld<*0<5 Q^ifhLjQLD&sflsb, <s>]£f0(§ (Sldsv 
udaiiiam 2 -iurj s-iurr <sn&,m aS)s<s)scsmu 

CTsi/susyrsiy s^sudaisomb CT®sr/_/<s3i<s predictor 
&>(o®sf]d>3)]d a^guib. 

h : x - y 

Validation data: 90 predictor-<sOT<s<s 3 af?zjz_/ 

&ifl luit&> a_ shma, n srstr CSarrSi'iu ut^j observation 
srsmsi^LD, ■SHff>rota/ 2 _<ssiyLD 3,ijsij3sh validation 
data CTsar/p/LD ^sis)Lf>daiuuQ\Lb. epQjj dltr)rb&> 
predictor-g ><5 (S<s/rsiy Q<aiusu^iD(^ 
(gsm/DrB&uLl&Lb 30 observation-^su^ (3<s<snsu. 



tSyprrsijgi pmb 500 sample data-63i6ii<s smaifilsv 

stnmpgjmCHmrTLD cts37/d/7<sl>, ^s<s)su 

CS7<S31<S37<5<531<SU7LD (o)<5/7(j)<5^/ learner-®»<s 

<S/DL5)<S<S/7LDffli>, Gsur^LD 300 <5/761/ <5 631617 LDlL<$\LD 
(o)<5/7@<5^/ <5/DL®<5<5 <3611 <5337®LD. l5)<S37<S37/7 <S/D/71/<S 

(o) < 5 / 7 ( 5 ) <5 <5 algorithm eyj«oii> LS^(Lpmm 200 
<5 / 7 <siy <5(si5 < 5 <5/7<s37 <si5)<s3i6ii<s3i iu asvsfldad Q&msixsv 

3<su<s337(5)ld. ^)<si/ SU/7U1/ @0 algorithm &lflujr 7<S 

<5(S33f?<5ffl/D^/7 £|)<SU631<SllUl/7 CT<537LJ<S31<5 
35/7^)L1LJ<55)0 2_<561/LD c?7/7>;5 200 <5/761/.5 (3 617 
validation data srmuu^Lb. QurTgi surra sample 
data-<si5)<s37 25% validation data-c^,<s c?7<s3i ldu/ld. 

Loss / Risk: asssfluL/ sruGSurrgiLb 100% s/Dui/7<s 

cS7<S31LDUJ/7^/. c?y61/611/7/71/ 631LD<511^/ <5611 gULD <35i1 

predictor qpsvld srQdauuQLb aststfluL/ 

3-G$$rG$)LDUjnm LD^)Li/_//_637 CT/5<5 c? 1 /61761/ 

6i5)ffl<5<5^)su (Smguu(^d}jD^l srmuss)&>d 
asmddhd-Qd 6fo./7i/6ii(3<5 risk cf^gjiD. ^/<s/76i jgi 



[Brnb validation data-g><® Qarr(£i^§i (H&rr^id^tb 
(Hurrgj, pLi ss)3>uSlsb ssbsn ssssrs^LDiurrm 
ld @)1 U L-i&> Lbj tSypsir aGvsfluLid(§Lb ssbsn 
(Ssv/ 7 i/uf 7 (S(_ '£g) LfiLiL /' srmu u®ld. 
3-&,[rrjGm£&jd(a) 850 udarmansb Qarnsmi— 
(UrsmlQuLi^apGdm efilstvsv 1200 0 until srm 
^ld<*0 'SjroaimQsu Q^flrB^rrs^iLb, epQjj algorithm 
g LpsvLD asvsfldauuQLb asvsfluurrm§i 1190 
5 ijuniLi ^svsv^i 1210 0 urriu (oTm/Qj^rrm 

^)0<®0 ld. ’SjQm&sflsb <g£i&jGUGS)rj 
«/D/p/<s(o)<s/7<sOTn_ L/<s<5<s/5j<s6vf)<s3r sfilstsvsvsmu 
sDSVjS^/ (Uprrrrnfujwna <spqj) sSlsnsvsLsnud 

aGvsfld(§Lb(l>urT& t j, ^shsurriQi^rTssr £|)0<s0ld. 

£|)dy<36iv pfliunstr (yxstn/DU-iLD 6&0 _£|)/ 5<5 ■*(/>) asyrsiy 

(Ssu/p/L/fr® £ 1 )03/ 5 ,< 5 n®j 9 ,rim, [blxjs^ algorithm 
<?/Duj/r <95 Gsvsjsisu QxFiudljo^l sissinjj ^ridj^LD. 
s iS)®i_a)iu l/5)<s<s {FjsbeSlujLDrTaid QarT($\£&>rTsb, 

tSHgj ^fjGyaiGisxsrrd a/DgiidQamsm® asttfluL/ 

/ 7 j)< 9 > Lpj£&)[TLD<SV, LDStSTUUITI—Lb Q&lLlgJ QaLTGm® 



spui3d&\rD§l srssrGp ^jijd^LD (£D<95®n<smj uftifol 
over-fitting-si) arrsmsvrrLb). ^giGurrsiriD Risk-go 
^sns61(^su^lsv 2 ui^am asbrnm. (tp<sa5)si) true 
risk ^i®£ 5 §i empirical risk. 

True Risk : 1200 55 umii ld^Iu l/ Qarrsmi— 

<3 [5 mJ-QiLiLj sar aS)G$)svujnm3)i 1190 

5 njumii srmayGtsrfldanjuQLbGun&j, 
^stni—uSlsysrrm 10 ^urriu STsxrugiprTsir 
a_sws3iLDUjfrs3rrisk. ^gstnm generalization 
error STm/r^LD <5&L/p/6u/7. Qurrgjuustni—iurTGtsr 
<5pmG$)[D a_0si7/7<*0si7<s/7si) <^TjDu(^SlsirjD iSlsmip 
srmu (d) un(fj)<srr. ^ssmsb Gurrsirp 

i3ss)Lf>3>ssxsrr spshQsurr^ <s/tsi 5 )/d<® 0 ld 
pssflppssfhuiraid asmddhGQd a^pisupp^ 

utdlsma ^ssrsuasuflsir tfrrn&ijiturrssi empirical risk 
stsstjt) spmpj asmi—rfihuuuQdliDgii. R(h) 
sTsl/L/ 0 / Risk of hypothesis ^(^ld. ^prrsu^i 
aststflu lSIsst QpsvLb st (p<* a,u ulLl- Qeusiflid(^Lb 



h(x), mapping ^LpsvLD ^s^ilduj Qsnsdsn^uj 

3-S$$IStS)LDlUrTm QsUStflldt^lLb f(x) &LDLDfT&> 

^)si)SU/7<5 UlL&£$6V, <3 Igl GpQlj Risk <oimUG$)&>(3uj 
ffiLpdasmi— (3j){f)§)irjLb ^qiSIjd^i. 


Jt(fc) = i(fc(*0,/W) 
= M *0 / /(* 0 


Empirical risk: GS&rrgisinssTdarTa [bitld 200 

L/<5<5<95/E/<95<S316Vr ^GlflUU£[Ta SVGU^&rfd 

Q&rrsssrt—nsb, ^ss)sn spsijQsurTssrrfilesT 
3-GtrssrGtnLbiurrm s^&nscd^LD - &Gvsfldauu®d)GiTiD 
®5lGtn6vd(§LDm5tT (Usury umdstni—d asmi—jfilrBgy, 
^sutDstnp &rjrr&$l <si(^uus,m eyj&jLt ^sumdjsyd, 

9,rjs^3y(^d(g)i£>rTm Gtprrprrujwrrssr risk s^ssran© 
<3i<5fi)L£)d&>(oVfTLb. ^ayGsu empirical risk 

^gjLD. £D<5<SOT ^LpSCLD &G&sfld&UU (jjl&lSSI /D 

L/<s<s<95/5j<95syi)<53r sfilssxsmu rrssraji (S<s/r rjmuLDrra 
^sususrrsy ttyurnu (SsuryunulLsysv 



^G^)LDrB^I(f^d(^LD (0T6VT [ 5 LDLDn< 5 V (SU6$)[JILIfT)j£§id 
&m_fn (LpL^LLjLD. 


1 m 

-R«mj> (h) = — ^ h ( x i) # }{ x i) 
m 


Empirical Risk Minimization : r, ld§i 
(S& rrgGtnstrdarTa usvGsugu algorithms Qanrsm® 
a_ 0 surra <* u ulLl- ugvGguiq] predictor-as/flai) 
dljorB^s^i^d asmi—rfiku validation data-®nsi/<® 
spsijQ<surr(§ aGmfluurrGtniLb 

^pmLJuuQdl/D&ii. usi)(3Guru observations 
(rpsvLD gzguQgu rrmrfilsoi id 3-GrrGrr <^l£>ui3g$t 

&FrT&!flujrTGxr& t j aGmQLSlisj-d&LjuQdljDgi. 

£g) d, <5<S51<95UJ &rjrr&ril £g) Lf>ui 3 m ro^/jL/asaflaj 

GTgGQJGtni—lLI LD^IULI l£l3id (^GTlJDGU f7<95 S-GHGtlQ^rT 

<s>i&,G$)md aiGssrQ\i3i^uu(3^ Empirical Risk 
Minimization Gjmuu($\Lb. £D<s/rj<s/rasr (^^^irjib 
i3GmGUQf,LDrrru. $}&>i6v arg min gtgstu^i argument 
of minimum <ormu QurrqjjGnu($\Lb. £g) si/ <sv rrrry 



aGmi—rfihuuuLLi— i£>^iui3ss)m, spshQsurrq^ 
algorithm -ld ^rndQ^sirCSjo Qs>nsm(j}.susu 

spQijldhsv parameters Garrem® err e31®Sip. 

i3mmQrj ^rspd ai<sissfhji3<ss)m CTsi/susrrsiy 
§U!JLb [BLbusvrTLb, ^pmrrsb aGvsfldauuQLL 
ld^Iul/ 6775<5 c gi/syrsiy<g 50 <s §]sbe6hui£>rrai ^) 0 <® 0 ld 
6Tmu3)i Qurrmp s^lsi^iudiaQsrrsvsvrTLb 

asmddlLiQd &m.pjSp^i. ^stnpu upr& PAC 
Model-ai) arrsmGvrrLD. 


ERM = arg min Remp (k) 



8. Probably 

Approximately Correct 
(PAC Method) 


sp(§ astfsfluUITGST (tpSVLD $aLp£&,UU($\Lb astfsflULI 
srsij susirsiy §nrjib £|)0<s0ld, <sn&>s$)m 

GJGUGUsiTGy {fflfjLD rBLbusvmL srsiruaji (SunsirjD 
QHlsyiUJiEiasrT srsbsomb £§)/5;5 (LpsnjDuSlsv 

aGmd&h—uu®&>)jD&>i. (Lp^sSlsv spqjj 
astfsfluuiTsrflGST astfsfluLi probably approximately 

correct -=g,<s ^ss)Ldsu^jd(^ ^surorSlsb 
srmQmmm usssrLjaiQsnsbsoiLb £|)0<s<s 
(S<SUSm(^)LD 6 TsilUG$)&, GpQTj&lSV SUSS)[JUJSS)p)ai<sb 

Qarnsm® Q^rr^ldSlp^i. ^gmsugi over-fitting 
|§)<su svrTLDGv ^) 0 «®/o,< 5 fT, inductive bias Qurdrgj 
Sl5)sV/E/0ffl/D<S/T,i.i.d (Lps^lJDuSlsV UuSl/D&lp 



< 5 /ray <95 m tsys/rl d <* u ulL <^\sb<sna,rT,^a,m sample 

complexity srajararay ^) 0 ^<s/tsu, asvsflu ry 
gr(7S)7si/«0 aijhurra, ^s^ildilild STmuaj (Surrsirp 

(SrsaddUsv srsbsomb <shjs muuu!j}.dipaji . i3mmij 
accuracy ldjdpjld confidence parameters etpavil, 
[BLDgi asvsfluLi srajararay dj/r/rro ajsbsSlujLDima,] 
toTsiiU6$)&>d asmddlQdlpgj. ^LD(Lpss)puSlsb 

realizability assumption srspLD awsmiwnmil 
&nGmuu($\Lb. ^mrrsb ^)^y /s/tld 
arremuQurr^LD Agnostic PAC Model-si) 

/£/5j®a5)®LD. £§)/si(3j 
0yr5)rjL5)L(J)syrsrr spsixQsurrmrSlsit 

<oi51<sfrd&>pb(ofi)&>inlb d(SLp arTsmsmib. 

Overfitting: epqjjdUsv wndtifid, ppey&Gtnmd 
Qarr ®<sdy learner-«p/j uLpdamrxsv, 

s£Li® (o)ld/7<5<5ld/7<95 ^sdsotjS^/jS (S/raydssnarayLD 
QarrQpgiu uLpddimrrsv overfitting srssrp 
^urnuLD sjpuL- GumuuLi s-srrm&tj. ^susunp] 



cgi/6)T6iy<95(S5 <S>l$)l3iLDrT3i£ &,[JGl)3>GiS)6fTU 

QurDgiidQamsntgnjLb learner -^sxrgi 
ajDgudQamsrnsrT (Lpiuirj® Q^iuiurrmsv, SrSVULDnai 

LD (515T LJ u m _ LD Q&iugjefilQdl/Dg)!. 

Q^rr^snmuSlmQurr^iLb, rr,rrLD siSi'iiju rnjdSi mjr> 
LD$£iui3ss)m£ gtfsbsSlujLDrTa, ^sifldSsirjD^i- 
$}&>l<zv a_srrsrrrisk-<55r ldSDlvl/ ctlKSu/t^/ld 
(gsm/oG) su. ^^mrr(S<sv(Suj £D<s3i<s «£0 pfliunrstr 

asvsflUurra sr<S^§ld Qamsrrm (Lpi^iunr^i. 
sjQmssflsb uuSlri)S\uSlm (Sungj ^s/Lldauui—np 
li$uj <5/r6iy<*(S)3<95(9) <^&>mnsb (Lpstn/oiurra 
&Gvsfhji3as)m $&{£>&&> (Lpuj-iungj. ^aCSsu £D/5<s 
Overfitting-g> ^svsvltldsv Q&iLiGupirjarTa 
3_msrr(%S) inductive bias <s&(&,ld. 

Inductive bias: hypothesis class <stsslu§i 

urntdlflp &,rjsija><5filGb ssbsn spdjQsnrLmsnpuLiLb 

gt[bQ&>L5&> label-6iyz_<sw (Lpsnpuu(S^^d apa 
(Ssusm(^Li) <srmtD Qff>m—iji3ss)m 



6 i 5 W«( 3 jffl/r)£^. eu Inductive bias ^0 ld . 
biased ^Tssrprrsv spsirfii&yissid smjpfdKtyLiupi 
loTsirgu Qurr<rijm. £g) ld<L psmjoudisv learner-^ssMy, 
hypothesis class-si) 3^p)uuQ\S\mp) 

Qpiri—pL/as/rlm <smpuu6ini—u51<sv, ^rjsijsnssxsmj 
urorfiliu ^rSlsviGu 

GumrjpgjdQamsrr&.liDgi. tSysiisurrrru 
Qu/ogndQ^irsmi— ^rSlaS]mi^uus<ni—u5lsb 
a6vsfluL3®n6tsr rflaLppgjGuCUp inductive bias 
srmp)s<s)Lf>daiijuQ\S\p)§ ) i. ^gjGsij pfitumssr 

(LpS^ipiL/LD ftij 

Hypothesis Class: 90 learner-g> inductive 

biaS-^g,.* ^)0<®0LD/Tffiy <S31LD<95<95 a_<S<SiySU^/ 

hypothesis class ^(gid. g)<s<s 3 i<s 3 r finite & 
infinite sissinjj ^[rsm® suGmaiLmau idlifl danse n id. 
Hypothesis srsirussia, ^lSIl^Isv aq^ajQaasb srmd 

Qaasvsvsvaid. srmQmmm anstsrfhjuansftm dip 
ssrroifQam ^Qjjd(§Lb 6 1 spud < su < s 3 i / 71 ll < s 31 / d < s 3 hl /<95 



Qan®$3)1, ^ 3 iS)m&Lp asvsfldad Q&rrebsviGu&i 
finite hypothesis class. 2_<5/7/7S357<sg/50 
youtube-si) login Q&iliilild 0 517 / 75/7531611 it® si> 

ud^lu um _s^i/td, ldss)souSIsv ^stvsniurjiTGorT 

um _s^i/td Qprri—ijddlujrTa (Haid-Qd 

QarTGmupQTjddliDrTrj <orssfl<sv, ^617055/7537 

hypothesis class ud^iu urnsv 

LDjDgULD ^GtsvsrriurjrTeoiT urTi—<sv stotj jld ^rjessr® 

617S315(7®537 dLp ^SS)LDILILD. £|) <5531537 finite 

hypothesis class-50 2_<5/7/7S35777>/755 
Q^rrsbrnscmb. ^mrreb ld[ 6 Qrorr^suQrrrr 57/5.5 

6175315(7®S37 S Lp C gl/6110S31/_(71 /75S31S37 £|)0507D 
S7S37 617531/7(77/71/55(3611 (Lp 7JJ.(77/7<5 ^GfTGljd(& ) , 5/7<5si>, 
ud$dl, /5531555-531517, 55357631(_, /5(_S3777), 

07p/5551<57j 7J/7/_si)5Sy7 57537 TJSII (3 617/71/ 

617 6315 (7® 61® 0/50/ 7D/7/D/7® 77- iJJljfOiLI U rTfjdSifOfTfj. 

575373si7, ^ 517055 / 7537 hypothesis class-si) 

^)si/617Sy7Siy 617531556)7 <5/7537 £|)05077) 57537 
517531/7(77/71/553 s 17 (7p7JJ.(77/7<5 7J7JJ. /fs337(J) (fl)5/75357(3l_ 



Q&gvg£]ld. jg)<S31<5<3UJ infinite hypothesis class- 
<50 a_<5/Ti7‘<S33rLDfT<5<j Q^rrsbscscmL. 


Sample complexity : umdiifiig, <5/751/<sisy/)<s37 

toTsmsttfldstna LSl&syLb (^s^iprB^i £§)0/5<5/73<sw/7 

^<SV<5 1)0/ ^61761/<5(0 c g>/£5)<SLDfr<S £§) 0/50173(51)/7 

< 5533 /? lvl/ pfiiurra rBsmi—Qu/ongl. otsstSsu s ^0 
,5 <533/7 LVLJ 17(55/7 <S37 Q SU IT) ffil ILI fT&SISrf ^3,10 0 
LDrr&diflujrTa QanQd&uuQ&lmrD 0/76i/<s6i , f)637 

676357633/?<5631<S631UJLj G) U 17/71/<530 <531 L£>ffl)D0 /. 

(UprrrjmuLDrra srsi/cusy/si/ LDrrS^ifld, <5 f7a/«sij 


G)<5/7(5)<5<5/7<si), cgl/ <5 OTjy<531/_UJ <95(533/7 UL/ Gp[J<SrrGljd(3 ) 

3/fiiurr3 £|)0<50 ld CT<s3r<5 <5 lji/si/ 3<5 sample 
complexity 



similarity D to the power of m 


OTSWU^si) m -^6370/ LDrrd)}[flUJIT& <oT($\d3ilju($\Lb 

0/761/<5 6lf) 637 676357635/7<5651<5 ^,077). ^gi/150 
57(5337(033/7<5(S3)<5<S31UJ<5 <5(S337<5®(_ 2_061/LD ffl6316)T0 



(SgirjrDLD i3msu(f^LDrfQ]. 


ro <[»] 


^sifld&uuQLL LDrrSdrflp 

i.i.d GTstruLD ^imiLDimd&im sul^KSiu rsi—ddljo^l. 

i.i.d <ordrjDrrsv independently identically 

distributed srswp/ Qurr^sh. spsirGfonQi—rrsirgfj 
* nrfu/DiD &>ssfid, 3 )ssfl uj nesr <5(7si/ !Dn§)tfiff>®svsfi 

sT(tid&,GS)iui 3 learner-* 0* ^jdlSIuus^i^CSuj / 

<su <s61 iLi gu&gl dl/Dgil ■ 

Realilzability assumption: rsmi <sjr6s,mQsu 

asmi— 2-grrrT6S3T&6>lsv, L/<5<5*/E/*6rf)®sr udatm&sb 
<SH&iai$l££[T<sb <SH&,GtT)]G$)l—UJ aS]sS)SVlLjLD 
tSytdlafld^LD GTStPilLD I LB IT GIST d Si &S1 GIST TB LD§] 

algorithm Gumrjggid Qarrmdtir)^. 

realizability assumption gtgstuu(^lb. ^gstitsv 
<^ r5&> ^GsiiLDrrGSTLB gtgvgvit s Tjananurrm 




<g5<53sf?LVL7<950LD QUfT(§ /5<5/70/ . S-grT/JSmggjd^ 6£0 
f 5 fT(offifiud)(o&>&) Srsmiij. sfilLii—msv, gssxsv si ^(Lgum 
®S}(LgLD[T STSOTU < 5 / 7)0 GTf5&> 6£0 ^16V)ILDITGtsrCLDLb 

Q&iliuj (Lpi^-iurr#)]. (^i&jQurTmtD rff&sxsviu/o/D 
g>m<5$)LDGts)ujd (^rSld^tb aGtstfluLiaGmsirru ujdj^I 

Agnostic PAC model-si) amsmsvrrLD. 

Accuracy parameter: 90 predictor/classifier- 
®sr ld^Iul 7 srsi/susirsiy gnijLb ^isbsShuLDiai 
£|)0<950ld 6jmus$)&>d (grfilda E srspro 0/75)u?(/j) 

uujmu($\&}rD& ) i. OTS 3 r(SsuR(h)>E srswL/dy s ^0 

a^mfhjuissflsir GpmsvGfihurTaGijLL, R(h)<=E 
GTsirugi GajnijmuujfTtf, s $(§ rssvsv 
aGvsfluunstTnaGijLD <orQ\^§]d QarTsbsrruuQ&lrDS)! 

Confidence parameter: £§)g/ delta ld^IulSIsst 
^ i^uuss)i—uSlsb (grfildauuQdl/Dgl. 

| i, = /(*<) 

to, */(•«) 



£|)| 5 )<a> uj/rio toTGdrjurrrjuugjLb, asvsfluurrm 

<oT®£5§jd&*ri]GiJ§nh #fihurT& ^(rfjuupiDarTGtsr 
rflaLfipaGy 1 OTssnsiyio, <s<sij/d/t<s c ?y<s3i7D<sn<5/D<s/7<s3T 

/#<s7p<5<s<siy 0 STmsijLD Qanmmu u®&>1ib3)1 ■ £§)<s<S5T 
= gyuj_Lj/_/<sy>/_u5)fflL> unrj^&.rTsb 1 stssiu^i ^jijsmt^Lb 
&LDLC>na = gy<s3iLD6u<5;D<sf7<s3T /#<s7p<5<s<siy srmd 

Qamsmi—msv, 1-5 srmu§] ssssTss)LDiurrm 

<55(SOT^UlS< 531(537 6T(/j)<50/6S<SiJD7j G7Jf70/7D/T<S3T<5/7<S 

£j£l<SV<5VrTLD<SV ^7<53170617<5/7)<®/7<53T /#<S7p<5<S6iy ^(070. 
£g)0/<3617 < 5 <5<S31<55UJ 7DfT^)/f)<5<S31<51T 6761/6176)7617 

0/7/770 /57D7J6U/77D 6T<S377J<S3165<5 0u5)<5607D 

confidence parameter (1-6 ) ^%(#,ld. 

cgi/(5)<5<5<5/7<5 £1)0/517(53157 /5/77C 5>/T)/T// J> Q J> rTSSSI /_ 

<si5)<si^ luiEi <5 <s3isir <s3i<su<50/ simple linear 
regression-g> 2_06i7/T<S06i70/ cttjtjtj. 67637/ti/ 


7J/7/7<5 <5611/770. 



9. Linear Regression 


9.1 Simple Linear Regression 

Simple Linear sjsitu§i suLfld anj/DGSlsv 

3-<sbsfT spqjj <sms\-uuGts)i—Ujrrm algorithm ^ 0 ld. 

$}$<zv aSlsurjmiaisb ’srshsurr^] Q<s/7(_/7l/ 

u u (diSimfossr, algorithm srsi/surr/p/ pasraji 

L-i$l&,s$)60 GtLDjDQarrm&lfDgl, <snr5&>u L/fl^sv GTf5&> 
^sfTGyd(&) &[fhum s ssbsn§] srsirugi Qurrssrro 
oSls^iUJihiaissxsnQujsbscrTLb sp(§dl<sv prjGijastnGfr 
ss)sn^§! Q&iu<sv(LpGtn/Du51<sv Q&iugj urrfjdau 
(3und)(3 /d mb. 3-&,rrrjsm^^id^ Gpqjj 
^snsiS]ss)md Garrem® <sn&>G$r ®S)s$)eoGts)uj 
srsusurr^] rfffjsmuSlLju^i sim ^juu^^luSlsv 

amsmsvrTLD. ^§isnss)U ^rDu5)/_(tpsrrsrr 
i3LL&rrG)S}m ^stts^ld, ^pjrjarrsisr sfilsvvsvatQtijLbX. 



LD/DgULD Y variable-si) sr (J 00 / Q a >rrmen 
Qsusm@Lb. ^j^iQsu label set wrog^LD domain 
set 

x=[6,8,10,14,18,21] 
y=[7,9,13,17.5,18,24] 

usvQsugu i3LL&n®S)m giS)lLi—£gs)&,u (in inch) 
Quivirilcgd^LD X trisiiugti explanatory variable 

srswsiyLD, ^usv/n/r5?CT/sDLtu S25)sT>su<g5S3isyr<*(in 

dollar) (o)< 95 /rswtij. 0 <® 0 LD Y stsotl/^/ response 

variable srsarsiyLD ^G&Lpd&uuQLb. L/sinsyf) 
aS]surj[h]3>sfTrT3y ^) 0 <® 0 ld £|) 6 u/d<st>/d s £0 
617631/7L/Z_LD/7<S 617 <531/775#!/ UIT[}u(SuILD. 

^ U Gu f70/<5/TSOTcgl/631617 Q&GV<5£lLb(l>UrTd(§ 

^ld<® 0<5 QgjijiiLi ld. matplotlib sissiug^ 

GUG$)[Jui—this,GSXSfT 617631/7/5#!/ <95f7L_Z_ 2_<56iyLD 6£0 

library ^(§ld. ( ^)dl<svi<srrm pvplot qpsvld [bld^i 
Limsifl GiS]Gurjrdaii^dairTm GUGinrjui—Lb 



snss)fjujijuLLQ\sbsrT§].^^,ri)airTm firrsv 
i3 sir su 0 LDrrgu. 


https://gist.github.com/nithvadurai87/ 

cb77831526033da63be0790f917efe63 


import matplotlib.pyplot as pit 

x =[[6],[8],[10],[14],[18],[21]] 
Y=[[7], [9], [13],[17.5],[18],[24]] 

pit.figure() 

pit.title('Pizza price 

statistics') 

pit.xlabel( 1 Diameter (inches)') 
pit.ylabel( 1 Price (dollars)') 





pit.plot(x,y,'. ') 
pit.axis([0,25,0,25]) 
pit.grid(True) 
pit.show() 


QeutSTfluuQpgi&lsiriD GUGtnpui—Lb 
lSI sirsu(7^LDrrgu ^qijd(§Lb. 






£|)/5<5 SU<531/TU(_<5^)sij LS/J-ff/TSlSsOT 
<sn&>m ®S}GtnG0&(&)L£lGts)i—(3uj QrBfjLDrrrDsb (o)<s/n_/7/_/ 
^)0/j/_f<5jn<5<® arrsmsomb. ^^nsu§] spssrrfilm 
ld^Iuli ^djlaflda ^Sjaflda ldjdQjd^s^^jld 
^^ lafld^LD GJsiruG& > (SrBrjLDrrjDsv. £|)/e/(0ld 
<smjuu\-£&>nm 2-Grrmgj. ^Gb&aji £§)®n;S g$)gij£§i 
<sp(ij algorithm-<S 0 <* arotryd Qarr($\iju&>rDgirTm 
rfjlrjev i3msuq^LDfTjQ]. 








https: //gist, github. com/nithvadurai8 7/ 

d94507f9052a6120dce5f20e31806cea 


import matplotlib.pyplot as pit 
from sklearn.linear_model import 
LinearRegression 

x = [[6], [8], [10], [14], [18]] 
Y = [[7], [9], [13], [17.5], 
[18]] 

model = LinearRegression() 
model.fit(x,y) 

pit.figure() 

pit.title('Pizza price 

statistics') 





pit.xlabel( 1 Diameter (inches)') 

pit.ylabel( 1 Price (dollars)') 

pit.plot(x,y,'. ') 

pit. plot(x, model.predict(x), ' - -') 

pit.axis([0,25,0,25]) 

pit.grid(True) 

pit.show() 

print ("Predicted price = 

", model.predict([[21]])) 


[filrTStiidamssr aSlmdaid: sklearn <srmu§i 
usvQsugu 6u<s3i<siu/7 <sot algorithms-g><® Qansmi— 
spqj) package linear_model 

library-si; a _sbsn LinearRegression() class- 
import Q&iLnuuu®&>)fD3)i. 

asvsfl u u rrsir/predictor ^gjLD. £|)< 5 /d 0 [bld^i 
<5/76iy<S63i6yr/j ujBj&d a/orrud QarrQuupjDarTa 

fit() 67631 /ll method uujmu($\£&,uuLL($\<5bm& ) i. 





l5)/d0 [bld^i Model srsirsurr^j arDtryd 
QarT6m®mm& t i srsirustsT^ ^rShu pyplot ^pecil 

GUS$)TJUI—LD SI J6r>[T[53)l 

auLLi—uuLLQsbsrr&ii. «sn /_dl iurra, predict() 
sissml function, [bld^i model-sw lS§] 

Q&ujgvulI® 21 inch ^srrsi^ 

Qarrsmi— i3 lI^its)!] sir sHlstnsv srsi/susvrsiy 

£§)0<95(0LD STS37 £BS^sfldSljD^I. 

rflpeyidamssr QsusfiUd®'. Gw/nasmi— smuggrrsir 
rflijstnsv ^]iud(^Li(Sunr^i, i3 sir su ld rr iq] spq^ 
susnijui—Lb Qsueifluu®&>)ro&j. i3sirssri}21 inch 

cgysvrsiy 

Qarrsmi— LShl&rrslilsir sSlstnsviLma [[22.46767241 
]] CTgn/LQ LD^iui3ss)m Q su sifl tlu® &,&>]&} ro &J. 




£|)/ 5<5 snstnrjui—^^lsb L/srrsyf) 

si 5 )su/r/ii«(05<*0 ld ld^^IuSJsv SpQJj QfBfjGairT® 

a_srrs)T<5®r)<5<95 ansmscmb. £g)ji/(3su hyperplane 

CTSW/p/ = gy<S31Lp<95<95/jLJ(/j)LD.. <SH&nGU§] £|)/5<5<95 

(HarrQpmdr algorithm- sw L//f)<5<a>. ^ssrlsuq^Lb 
i3LL&rrG)S}m si5)l!/_<s^/<950 £|)/5<5<95 QaimLi^srim 

G$xsu£§]£&,rTm ®S)s$)GOGtsnu& aGvsfld(§LD. ^rspu 
L/iflgisvidamstir QaimLis^ro^LD asssrs^LDiurrm 
L/srrafl oUlsurjiEiam ^snLDrB^isbsrr 










£g)i_<s|5)/D0u5)<sir>i_(3uj s£0 dlgy ^stni—Qeuts/rl 
^(gjuustn^d amsmevrui). )rs&> 
g)®n/_(o)susyfi(Suj residuals ^sbw§i training 
error tormgy ^sys<s)ig>daiuuQ\Lb. ^)/ai0 rsni-b asmi— 
2 -£brT[j< 5 m£B$£l si), 21 inch si 5 ) lLi—ld Glamsmi— 
i3LL&n®S]m siSsaisu 24 z_frsu/7 otw ld<®0 

isjroaimQsij Q^rfliL/Lb. ^ssmsb ^smpGiiu gLDgj 
Model Qarnsm® asvsfld^LbGiurray <s> /<5<s3r sfilstsvsv 
22 i—msvrj toTstsrd anL2Q\snss)g,d arrsmsmiJ). 

£§)<s djsGuj generalization error / risk <orssrggih 

^gysuij. QurTgiLjustni—iurra spqj) 

l-I$\&>gs)gv ^stnLD^gidQarrsm®, stnsijp&y 

aGvsfluupmsv sjrouQ\Lb error sTsirgy Qurrq^sb. 
Residual sum of squares tndiu&y £§V/0 risk-g><® 
asmddli— a_<ssiy ld s^ 0 function 
^j<ss)^Qiu loss function / cost function OT<sw/p/ 
^gyeurj. Residuals sum of squares indiu^ 

^l£&,65)&UJ ^LpULSJdr &[J&ifl6V> LLI <95 (oSW/_ ffil[£>§] 



g^ryji-b. £g)/ 5;5 risk-< 95 (g srsirm atrrpsmLD, 

$6Vd, 

<5TUUL^d &,smdQ(tj)6ijg ) i, a.GmQiSiLq.Lju&j 
<5Tsiru§] ujnjflu iSlssrsu(fi)Ujn iqj guT&msvmb. 

Algorithm - Simple linear: 

[BLDgi giSttfluumsirfitO ^Lpevil) a/Dfryd 
Q&msrriQnjLt) pweirurT® iSlsinsiJQijLimiru 
^(jj)d(3^iij. ^3f](3<su Simple linear regression- 
d&nssralgorithm ^(^ld. 

y = a + Px 

g)^)su fBLD&ti explanatory, response variables 
psfihjdSh a (intercept term), p 
(coefficient) svapiLb parameters 

3irTsmuuQ\S}mp)m. ^(brisnsy ^surpeminu-iLb 
(H&rjdQp rfiLDt^i algorithm &rDipjd 



Qamsrr&l/Dgil. rsLDgj model-<swrisk-<® 0 <* 

arrtjsmLb. aisssTQ\i3i^^§] sSIlLi—Itsv risk- 

go {.ruuLgd (^stijduu^i sisirut^i 

Q^fifB^ieiSI($iLD. (Lp^&Slsv (3-sot LD^iui3s<s)md 
aisssiQ\i3i^dai (Ssusm^LD. i3mmij <^gs)&, ®nsu<s^/ 
(X-sot LD^iui3s<s)md asmQLUlLq-pgi 
sSlisvrTLD. Variance st 6 otl_ 0 / /BLD(tp®n/_uj 

explanatory variable-si) asrrsrr <5/76i/<sswsot0/ 
STsi/svsvrsiy ^)sn/_(o)svsyfl si5) <5 £ 5 ) uj frails) si; 

^s<s)LDrB§i<sb<sfT§i srmu6$)&>& (§rfild(§Lb.. 

[1,3,5,7,9,11.] STSOTgl/ ^)0<®0LD UL-&£$<SV 

<sn&,m variance 0 ^, 0 ld. sjQmssflsb ^stvsu 
§rrms$r ^sOTLfflsvsyfiuyLSOTcgysOTLD/Bdj/syrsyrg/.. 
^glGsu [1,5,7,10,11.] srsOT/p/ <31 (j))&)&)(}))&)&> 

Grsmatgnjdamszr ^)sn/_(o)si7sy?) ffujDQJ £§) 0 <* 0 ld 
ul1&£$z)<sv, dfjjDJD <ssotsotld STsi/svsyrsiy 

^) 0 <®ffl/D^/ simd if,smdSu^isu <3^ variance 
40 ii. Co-variance stsotl/^/ [bld§i explanatory 
& response variables ^rrsmQdLb GS&tjd,^] 





CTsi/susyrsiy ^js^ii—Qsus/rl g)51£$ujit&£$zI6v 
^®s) ld[b ajisnsnaji isTmuG$)&>& (gjifild(&jii>.. 
^)si/si5)/r<sw(J)<g50i£) ^)<sd/_(Suj linear Q<s/7/_/7l/ 
^sbssxMQujmprTsb, £|)<5<s3r ldSJu l/ 0 ^,0 ld. 

£|) (S3) <519 <95(25 <95 <95/7(537 (^^^l[Jth]3i<sb l3m<SU(f^LDrfQ]. 


y = a + fix 
a =y- fix 
cov(jf ,y) 

varW 


»(*) = 


SI ,(7 


n-1 


cov(jr,>') 


E>,-n(*,-v) 

n-l 


Numpy library-su ssherr spqjj&Go asttflp 

functions (3 ld/7)<956337(_ (3r,£$[Jth]3iGlflsir UlS). rULD&j] 
<5/7siy<95<53>synj Qurrq^^^i <si5)<s3ii_<s3>uj ^sufld^LD 

Gtsusinsvsmud Q^iiidlsirpsm. 






https://gist.github.com/ 

nithvadurai87/406747e718d04a4bc339f740b5f 

9de62 


from sklearn.linear_model import 
LinearRegression 
import numpy as np 

x = [[6], [8], [10], [14], [18]] 

Y = [[7], [9], [13], [17.5], 

[18]] 

model = LinearRegression() 
model.fit(x,y) 

print ("Residual sum of squares = 
",np.mean((model.predict(x)- y) 

** 2 )) 

print ("Variance = ",np.var([6, 

8, 10, 14, 18], ddof=l)) 






print ("Co-variance = 
",np.cov([6, 8, 10, 14, 18], [7, 
9, 13, 17.5, 18])[0][1]) 
print ("X_Mean = ",np.mean(x)) 
print ("Y_Mean = ",np.mean(y)) 


£|)<5<sot (o)svsyf)ui/_/7<g5 i3msu(f^LD 
Qsij<sfiluu($\Lb. 

Residual sum of squares = 
1.7495689655172406 
Variance = 23.2 

Co-variance = 22.650000000000002 
X_Mean = 11.2 
Y Mean = 12.9 





^jsusurrgu rsmi asmQLUlu)-^ LD^iULjaissxsn 
&LC>6irurrLLL!i-<5V Qunq^^^issmsb, 21 inch si5)l./_ld 
Qamsmi— lSlcWtsiSsst siSsaisu sTsi/sn^/p/ 22.46 
z_ nsv[j <ST&!srd amlQ&ljDg)! gt sir using ^r^huscmb. 

y = a+ Px 
= a+ p (21) 

= 1.92 + (0.98*21) = 1.92 + 20.58 = 22.5 
where as, 

p = 22.65/23.2 = 0.98 

a= 12.9 - (0.98*11.2) = 12.9 - 10.976 = 1.92 

R-squared Score: <s<sr>/_£)ujf7<s rsml 

a_0snf7<*®u7syrsyrmodel CTsi/susvray §ifrjib 
s-GmGtmmunm LD^iui3sis)m ^<sifld(^Lb 



<SyGfTGl]d(§U QurrQ^rB^IU-ISnsrT^I 6TG$IUS$)&>d 

asmdS}(i)<su(S^ R-Squared score ld. 


https ://gist. github .com/nithyadurai8 7/ 

a39ecee72dc4a266933621c298e80df9 


from sklearn.linear_model import 

LinearRegression 

import numpy as np 

from numpy.linalg import 

inv,lstsq 

from numpy import dot, transpose 

x = [[6], [8], [10], [14], [18]] 
V = [[7], [9], [13], [17.5], 
[18]] 

model = LinearRegression() 
model.fit(x,y) 





x_test 

= [[8], 

[9], [11], [16], 

[12]] 



y_test 

= [[11], 

[8.5], [15], 

[18], 

[11]] 


print 

("Score = 


", model.score(x 

_test, y_test)) 


R-squared score = 0.6620052929422553 

score() sismiLD function, arrstr (3 ^£§?I[t£§>Igv 
, [eld&ii validation data-snsnu Qun(nj£$ 
sfilsvi—uSlstnstsTrBLDd^'SysrfldSl/Dgi.. Qurr&jGurrai 
score QGUtsifluuQ^gjLb ld^Iul/ 0 -&SJ(§rB^i 1- 

SUSS)tJ ^<SS)LDlLjLD. 1 OTSSTL/g/ OVerfj t-^^SUfTSl), 
&/Dg}] 1-650 QrBQJjlhldhu LD$UU773, ^(fJjrB&rTGV, 
prut sjfD!Q]dQairT<sbmsorTLb. £g)/5>0 
r5LD(Lp6tni—UJ model-SW LD^IULj 0.66 OTS37 
Qsusf7luuLLQ]srrsnffj. Simple linear -g> siS)z_ 





multiple linear-si) accuracy 

£D0<®0ld. ^Gmpuu/rjrfil arTsmGiurTLb. 


9.2 Multiple Linear 
Regression 

Simple linear-si) sp(§ iSiLl^nsiSlssi ®S)G$)®mjnm§] 
<SH&,sir aS}LLi—G$)&,ij Qurrgupgi ^tdlarfluuGtnpd 
asmGii—rTLD. ^mrnsb ssmstUDuSlsv siSssisu 
^^larfluLid^ lS&j fffisuuu (pri> toppings- 

ld spqij arrrjsssfhuna ^] 0 <® 0 ld. srmGsu spqij 
i3LL&rTG!S)GZi si5)®nsu siSllld ldidqjld 

^idevisrrm toppingS-ssr loTsmsmflds^ia ^,dhu 
^jijsmss)i—iLiLD Qunn)]4,^,1 
<sn<5$)LDdrDff,i. ^QiQurrmtr}] spmtrydf&fLb 

GSldjdulLi _explanatory variables-®»/j 

Qurrtr)i£§}, response 


variable ^smDrB&msb, Qsu multiple linear 
regression STS!fruu®ii. &lds5IU[t($\ 

L5)sOT61/0LDfrg2/ ^)0<®0LD. 


y = a + P,*t + /?,*, + -+P.I, 



>; 


a + pX , ‘ 


y 2 

= 

a + pX 2 


Y„ 


a + pX n 


P=(X’X)'x T Y 


(Sldjb^s^sti— c#/(3<5 a_ff>rTijsmif,S)isv explanatory 
variable-siyz _sot toppings -sot srs3OTS3af?<ssOT<suyLD 
(3&rj£§l multiple linear-g) ^0 su /r.® £) uy srrG> swytld . 

£D#y iSi (5QT<oU (Ff) LD [T fQ]. 


https://gist.github.com/ 

nithvadurai87/7068c32bd4d7fccb67ccca39623 

f68bc 










from sklearn.linear_model import 
LinearRegression 
from numpy.linalg import lstsq 
import numpy as np 

x = [[6, 2], [8, 1], [10, 0], 
[14, 2], [18, 0]] 

Y = [[7], [9], [13], [17.5], 
[18]] 

model = LinearRegression() 
model.fit(x,y) 

xl = [[8, 2], [9, 0], [11, 2], 

[16, 2], [12, 0]] 

Vi = [[11], [8-5], [15], [18], 

[ 11 ]] 

predictions = model.predict([[8, 




(StD/D<g5<s3sr/_ rflrrGVidarTsxr QsusiflubQ) 
/_5)<sOT6u0L£>frgi/ ^stnLDiLjLb. £§)6>)6U accuracy 
/^)<g5/^<5^)0Ly/_/<s3i<5<® arrsmsomb. simple linear- 
si) 0.66 srmrDrrsb multiple linear-si) 0.77 srm 

^)0Ly/_/S31<5 <95SUS3f)<*<95SiyLD. <ST U G> U [Tgl LD Simple 

linear-go siS)/_ multiple linear-gpu 
uujmuQ\^§jLbQurr§i accuracy (SW/gn/tii 

<g£i(ri)d(3)Lb 






values of Predictions: [[10.0625 ] 

[10.28125] 

[13.3125]] 

values of El, E2: [[1.08548851] 

[0.65517241]] 

Score = 0.7701677731318468 

/ 5 /tld asm® l® i$-£a, £|)/ 5<5 LD^IULiasrr (HLDjoasmi— 
aLDsirurus\.sb i3ssrsuq^LDrrjQ] QuiT(§[B^i®sirjD<m. 
£g)6S)<si> intercept term -^j,sxr a-sii ld®uli xl 
LDjDguLb x 2 sTsniLD )rjsm® variables-«»uyLD 
Qurrgu^^i c gi/<sT>LDSU ; 5/7su, QurTajsurra spqj) 
constant-^* £§) 0 « 0 ld. 

10.06 = a + (1.09 * 8 ) + (0.66 * 2) 

= a + 8.72 + 1.32 
= a + 10.04 



10.28 = a + (1.09 * 9) + (0.66 * 0) 

= a + 9.81 + 0 
= a + 9.81 

13.31 = a + (1.09 * 12) + (0.66 * 0) 

= a + 13.08 + 0 
= a + 13.08 

9.3 Simple Linear Algorithm 


Simple linear regression -danrasr PLcmurr® 
l3msu 0LDf77p/ ^SS)LDlLjLb. SS)SU^§! (1,1) , 
(2,2) , (3,3) CTgn/Lo L/syrafl s^surjmian^d^ 

i3m<su(njLb &GvsflUurrsir h(x) qpgvld aiswfiuuss)^, 
pmci ^)/5J0 2_ff,n[j6muDnff, isrQ^&jd 
QaimsbQsufTLD. 


h(x) = 0 O + 0i x 


£D/5<5<95 aststflUUirmgl ^L/_fT-0 LDJDgULD S^lLI—IT-I 

siap ill, (Lpd&iuj parameters-gxj 

QunrgupQg ^sidldSIid^i. ^ai/ijssi/oGiu (Lpssresrrj 

alpha, beta srm ^ssiLp^Q^mb. Qsusij(Usury 
LD$uLi<snGfT parameters-^ Qsusij(S<sugu 

GUGtnauSlsv asttsfluLiam rflaLp^uuQGUGinp 
i3m<su(rjjLb a_<5 rrrrsmd,5)si) anrsmsvrTLD. 

https://gist.github.com/nithvadurai87/ 

c57acll97368249f015ed4dldba029f0 


import matplotlib.pyplot as pit 






x = [1, 2, 3] 
y = [1, 2, 3] 

pit.figure() 

pit.title( 1 Data - X and Y') 
pit.plot(x,y,'*') 
pit.xticks([0,1,2,3]) 
pit.yticks([0,1,2,3]) 
pit.show() 

def 

linear_regression(theta0, thetal): 
predicted_y = [] 
for i in x: 

predicted_y.append((theta0+ 

(thetal*i))) 

pit.figure() 

pit.title('Predictions' ) 
pit.plot(x,predicted_y,'.') 
pit.xticks([0,1,2,3]) 




pit.yticks([0,1,2,3]) 
pit.show() 

thetaG = 1.5 
thetal = 0 

linear_regression(thetaG,thetal) 

thetaGa = 0 
thetala = 1.5 

linear_regression(thetaGa,thetala 

) 

thetaGb = 1 
thetalb = 0.5 

linear_regression(thetaGb,thetalb 

) 


(Lp^oSlsv (1,1) , (2,2) , (3,3) -dartm GUG$)[jui—Lb 
susn[jriuuu(^S}jD^i. =1.5, 

7-1 =0 6T 6V)ILD (HuiTgl l3mSU(f^LD 





3= ld sirurTLiLS).si) G)urr(rj)£$£i ( 1 , 1 . 5 ) , ( 2 , 1 . 5 ) , ( 3 , 
1 . 5 ) STStniLD LD@)1 U L-l&>6tf)SiTlLI LD, 


h(l) = 1.5 + 0(1) = 1.5 

h(2) = 1.5 + 0(2) = 1.5 

h(3) = 1.5 + 0(3) = 1.5 

cgy siisimGtjo 0 =0, ^Lii—rr -1 =1.5 stsvild 

Qun§i (1, 1.5) , (2, 3) , (3, 4.5) ermirb 
LD^UL/asaisyruyLD, astni—&hurTa ^l!/_/t-0 =1, 
$ 5 lLi—i 7-1 =0.5 stsviild Gurrffj (1, 1.5) , (2, 2) , (3, 
2.5) srstniLD ld@} 1 UL-i&smsrriLi ld ^isifluustn&d 
arrsmsvmJ). 

h(l) = 0 + 1.5(1) = 1.5 h(l) = 1 + 0.5(1) = 1.5 
h(2) = 0 + 1.5(2) = 3 h(2) = 1 + 0.5(2) = 2 



h(3) = 0 + 1.5(3) = 4.5 h(3) = 1 + 0.5(3) = 2.5 


jsi/sunrgu &SSSr@L 5 lL/-d&UULlu LD^I (_}/_/<*(OTj<950 
suss)rjuurmaisb suss)rjiuuuQ\Sssrp)m. £|)ssisu 
lSJsstsu( rrjLDfTiri] ^ssh-dil/ld. 


Data - X and Y 



















(3ld/d<s<sot7l_ 3 asttfluLiasuflsv sr&,m asvsfluLi 
ssmss)LDiurrm LD^IULi^i^d^ ^(rjjdUsv 
2 -<sb<strQ&,rr ^/J.z_/7 LDtdlULiastsvsrrGiuj 

fBmb ^rytdliurra as^sfluLSlp^ 

<°r®£&>ldQamsbsfrGorTLb. ^)/ai0 (1,1) , (2,2) , 
( 3 , 3 ) 67gn;LD td^l} l/<*(613<s0 (1, 1.5) , (2, 2) , ( 3 , 
2.5) gigsuld LpdlULiadr &niir)i ^rndUsv 
GurB&ysrrm&y. srmQsu ^l_z_/ 7-0 =1, ^L./_nr-l 
=0.5 sissy ld LD^iuLjaySsxsnd Qarnsmi— 

asvsfluunstnGtsrGliu rsmb (S^ijsy Q&ujgj 
QairmCSsumi). 

Qsuryd 3 jS/rsiyasyr ldlLQld ^(rjjuupmsv, 

STGS)&, GiUGU^&jd astPsfl^ITGV <£563«f? U l31jD(^LD 

3smss)LDiurrm LD^iuiBlro^LDrrm (Hsuryurr® &rDry 

0<S31/D6L//735 £|)0<®(3jLD 67637 lj L£> LD IT <5V &<5VULDITad 
On-jD (Lpuj-U-iLD. ^mnsb rfleoptyUsv 
^,uSlrjd&smddhsv < 5 / 761 /<*6V7 ^(njd^dGiurTay, 
£|)/5<5 Gsuryurrduyss)ssid <®6OT71_/z5)/b^/ <sfo./D 

a_<S6i/LD {g<s^)/7(Si7) cost function ^(^ld. 



Cost Function: ^^roarrest ^Ltesrurr® 

L5)®sr6U0LDf77r}/. 


J =2k£(M* (<) )-i/ (i) ) 2 

i=l 


J = cost function 

m = Qld/ 7<s<s .s/jsiyasrflssr GTsmsttfldstna 

i = fflLDf7<5<s<s <s/Tsiy<95syflsi) s^eijQeurTesrinrTad 
Q&6V6V a_<ssiyLD. . 

h(x) = aGvsfldauuQ&lsirrD ld^Iul/ 
y = in^iijurTfjdSimp) ssmstnLDiurTm ld^Iul/ 

(HLDpasmi— <shG&> L/svraf) sfilGurjrhiassxsfT 
i3mGU(rj)Lb ^LDsmurTLiLSj.sv Qurr^^^l, pent 
Gi£rjr5Qt£®i£3jl§IGrT6fT aessfluurreiT ffiffihu ^syrsiy 



cost <3 GutryunLLiq.s<s)m Qsusufluut^^^idlp^i' 

tsrmd amsmsijLb. ^groamstsr fipsb i3msu(f^LDrTjQ]. 


https: //gist. github .com/ 

nithyadurai87/86bd4ec2288d0e9afl38a30a7af 

44a09 

Glsusifhrf®: 

cost when theta0=1.5 thetal=0 : 

0.4583333333333333 

cost when theta0=0 thetal=1.5 : 

0.5833333333333333 

cost when theta0=l thetal=0.5 : 

0.08333333333333333 


asmdd® fi&Q-gLD si5 )<sld: 



(1, 1.5), (2, 1.5), (3, 1.5) vs (1, 1), (2, 
2), (3, 3) (^5L_z_fT-0 =1.5, |gL_z_/r-l =0) 


J = 1/2*3 [(1.5-1)**2 + (1.5-2)**2 + (1.5- 
3)**2] = 1/6 [0.25 + 0.25 + 2.25] = 2.75 


(1, 1.5), (2, 3), (3, 4.5) vs (1, 1), (2, 
2) , (3, 3) (^Li— n-0 =0, |gL_z_/r-l =1.5) 

J = 1/2*3 [(1.5-1)**2 + (3-2)**2 + (4.5-3)**2] 
= 1/6 [0.25 + 1 + 2.25] = 3.50 


(1, 1.5), (2, 2), (3, 2.5) vs (1, 1), (2, 
2) , (3, 3) 0 =1, ^Lii—rr-1 =0.5) 



J = 1/2*3 [(1.5-1)**2 + (2-2)**2 + (2.5-3)**2] 
= 1/6 [0.25 + 0 + 0.25] = 0.50 


pesflppesfl (HeugijunQaesvsnd lLl ^ 3 /<s<sot 

^rrrr^flsmud &ess!($\i3iq-iju&,esr g LpevLD epeiiQeun(§ 
aessfluunesr /b/_ 65 ^/ld asvsflU l/ld gt! 5&> 3l<sn<^ 
Qsn!Q]unu\.sb 3 )ssnDiLiLb eresruesi^d a^jo 

(LpL^lULD.^j < 5 <5<SD<5I5UJ Q Gil t!}]Un($\gi efieST 
LDi—ihi(&)3im 3ySssrQ\i3u\.d3>uuLLQ\ 3 )gsvgli 2-^sb 

GU(& ) d'9>uu($\Gi]&,!D3inm arrusmid eresresrQeiiesflsv, 
nrrnnfiesnud aem@L3LS)-d(§Lb(I>un& l i sjff,nen§i ep(§ 
toT$dlrjLD6tn/D STSOTY ^)0^<5/7<oi)<9ii.L ^ 0 / 
<sfo.L./_L}(_/( 5 )su ; 5 /D 0 u$d)<svna aLfidauuLl® 
efilQid. GTmQGU&.rrm ld unesttry 

^GsiLDdauuL-Qmsn&tj. ^giGteu sum of squares 
error eresrgi] 3tGs>L£dauu®Lb. 

£g)/5>0 rsnid 6jjD<3i<5sr(Ssu &essr($\i3iq-!5&> ^L/_fr -0 

= 1, ^/_!./_/r-l =0.5 w^uL/aerr Qanemi— 

aessfluunGtesr (gesijorsp 3 )£ytgi] COSt-g> 



QeusifluuQpgiGusftisid anrsm<svrrLb(0. 50 ). 

(oT(5ftT(&(oU &) IJ G1J 3> (ofil (oftT <oT (oftftT (oftftf)3>(oft)3> 

QuQjj&lGtsriTGgiLb, £|)/5<5 P&G>l sir sold 

(Ssi7/p/Lf/TL.tiJ.<S31<S5y /5/TLD &6V U LDtTad 

asmddh—msvrTLD. 

^fB&> 0.50 67gn;LD (Ssu/pyu/T® &fD£U ^^laab tsrmd 

<950^)<S37f7si), ^)<5<SJ DSftT <J§)GftTGft)ILC> 0<S31/D<®<* 

usvQsugu @>lLl _ rrd&>({oiT)d(&) ^dQ&n&.GsxstsrGtmu 

$5l(rfjLbug $B)(fjjLDu Q&ujgj <s>I$b1<sv (&)G$)rDGi]nm 

(HsLigUurT® 6jtDLi($\£&jLb $lLi— rrd «sftisrr 

&><5$5T(j)) l51 uj-d&> a_<ssiysuG<s Gradient descent 

^^ld. ^£Bfij(^ (Lp^&Slsv usvQsugu 
^L-Li—ndaffiflsir LD^BlustnuiLiLb, ^suroiQidairTm 
cost -gJiL/LD s£0 su<sjn/n_//_LD/r<95 <su6ft)tjrB&ii 
urrijuQufTLb. ^j^iGeu contour plots ^0 td. 


Contour plots: 




^i^uuss)i—uSlsb aGmi—jfihLnjuLLi— cost ld^Iul/ 

LD (ip 6$T(o$)JD U-l LX) (LpUUlflLXiLTSm GUSS)[JUI—Lb[T3i 

GusmrrrBgi <s/n_!_z_ a-jSsiysuGts contour su<s3i/jruz_LD 
Slsmsm sulj.si5)(S<su/7 ^sbsc§] sul!/_ 
sutij.si 5 ]( 5 su/r l 5)(SOTSU0LDfTffl/^)0<950LD. Ll<shsnfl 
Gtsxsu^&jsbm ^t—tmansfileb srsbsoiLb cost 

^jq^dSl/D^i s7sot emeuggid Qamsmi—msv, 

^guid strip sTsvsvmi ^stnsmuupsir (;LpsvLD 

SlsmsmLD (SurTGST/D SpQl) SULpSULD SjpLl 







gulLi—Ld GurrsmjD gugv>ijui—£$£i<sv L/suCSsu/py 
^L /_rr tD^)uL/<95(55<95<95/7537 COSt LJ6u3617/p/ 
si//J./_/Bj<g5S)Tfr<95 G)Gus^uuQ\S}mp)m. srmQsij 
617<5 S^STT 631LDU7<5631<5<95 <S63571_j5)617<563l 
^Lp<sv(l>LDrT ^(SVGVgj dlsmsm^tdlsir ^ispuurrapGtnp 
<s>lG<s)i—Gi]&,m ^Lp<sv(I>LC>n 0<s3i/D^ cgi/syrsiy 
(Ssu/p/L/frLtiJ.<531(537 (o)617 <SyR UU ($)<5<5<95 <5fo.tJJ.UJ 

$5LLi—rr&aiGS)GfT rsmi asmi—ifluj (LpupiL/LD. £|)/5<5 

<3sit<s3i6u<snuj3 uj Gradient descent Q&tuSlrogi. 

dLpdasmi— rffijeShoU -2 sSlqfyrB&j 2 sussi/j 
^LZ_fT<®<95(6)5<®0 100 (LfiGTIJD LD$U L/<S6)T LDrTJDjfil 
LDrrrf)(fil ^GiifldauuLl® cost 

<56337/_ JT^UuLJU(J))d)jD&)] . £|)/57 (3j nUmpy QpGOLD 

@>lLl _ rrd&>({off)d(3) w^UL-iam 

guL prhj&uu(J))d)(otiTjr)(otfT. £|)<st>su uniform 
distribution (tpesi/ouDsu ^sdldujld. 



https://gist.github.com/ 

nithvadurai87/8cl20370181f5bb9ad966dc9fdd 

7935b 


from mpl_toolkits.mplot3d.axes3d 
import Axes3D 

import matplotlib.pyplot as pit 
import numpy as np 

fig, axl = 

pit.subplots(figsize=(8, 5), 
subplot_kw={'projection': '3d'}) 






values = 2 
r = np.linspace(- 
values,values,100) 

thetaO,thetal= np.meshgrid(r,r) 

original_y = [1, 2, 3] 
m = len(original_y) 

predicted_y = [theta0+(thetal*l), 
theta0+(thetal*2), theta0+ 

(thetal*3)] 

sum=0 

for i,j in 

zip(predicted_y,original_y): 

sum = sum+((i-j)**2) 

J = l/(2*m)*sum 

axl.plot_wireframe(thetaO,thetal, 
J) 






[BLDgi <5frsiy<*{S)5<®<g5f7®sr sussi/julld i3msuq^LcrT^]. 






9.4 Gradient descent 

0<S3\2)f5;5 cSySlTSty G>6U/Tl/LJf7(/j) i ZJJDLjQ^d 6fe.tJJ.UJ 

iBL-Li—rrdas/rlGir i£>^iui3ss)md &Gm@L3is)-d(§Li> 

<3susaisusnuj gradient descent Q^iudlno^i 

(Lp^oSlsV ^L/_/76565(S)5650 Sp(§ (^jAulSIlLL- 
LD^iui3ss)md QairrQ\^§i ^^jomn&sr COSt-®»<® 

asmi—rfilSliDgl. i3mmij ^LbLD^iui3eS\(j^^§], sp(§ 
(^j^IulSIlLl- cgystrsiy si5)ffl<s<s^)<a) SiLLi—ndas/rlm 

LD^IuLi^m (gGtnrodau u lI® *sng>rDmnm cost 

aGmi—rfihuuuQdl/Dgi. ^shsunpnai spsnQsunq^ 

3tL ppdluSleviLD dljfilgj dljfilpnad (gsm/oggid 

Q&rTsssrQt— <su[b^i 0 <sjsi/d/ 5 ^ ^sysytsiy cost 

65<SW(5)t5)tiJ.6565tjLJ(5)ffl/D^/. • <g£l&>fD3inm ffLDSSTU f7(/j) 

i5}m<su(rij Lenny. 


Ou 

:=0o 

wo 

0i 

:=0i 

-a4-.J 

Ult i 



£g)/5/0 s^si/(o)su/70 SrLprD&uSlm cLPisj-GfihsviLb 
{blLi—IT -0 , {blLi—IT- 1 ^Shusupil^sir LD^IULi^m 
spQrj (Ursrj&Silsv (gstnjDdauui— (Seusmt^LD. 
£§)g/G<sij simultaneous update €rsmuu®Sip§i. 
&)GmsmLDrT& ^(f^ui3sir, dlsmsm^^lsir 

^isf-uu^tdlsmud asmi—rfilsugiLD, < sul _!_ z _ ld 77 <* 
^0ULS<SW = gy6U6ULL<s£5)<S5r <SDLDUJ<S<531<5<5I5 

<®<s33r/_fl5)uyLD <3<su< 531 <SU<S 31 UJU 7 GLD £|)f 5<5 gradient 
descent QaiuSipgi. £§)g/G<su global optimum-g> 

^9/(531/ _ LLj LD SULfi ^,0LD. £|)<5;D0 LD77/D77<* local 

optimum torsirugi COSt-<s37 LD^IULisnsuflsv 

(0)7577/ _ rfddhjj rr& ejpp ^pdairmaush ^(f^ui3m, 

spsuQsurr(§ ^pdai(LpLb local optimum 
tormuu($)Lb. (o)Ljj0/<si7fL<® linear regression- 
<®<®f7<53L sii(S3i(7 l//_<*<*)< si> , local minimum sjmug,] 
®<53ii_uj/70/. global optimum ldl!(5)Gld. 



Alpha <ormu&)i ^LLi—ndgnsfilm LD^IuLi^m er/ 5 <s 

cgi/srrsiy si5)®<s<s^)<a) (^ss)/od& uul- (Ssu<sw(5)ld 
GTsiruG$)& > & (grfild^LD. ^)<s®sr ld^)ul/ uSasiyLD 

£)rt5)l/J<5/7<956iyLD ^sbsom£>sb, u5)<95SiyLD 
(o)u/D(u<s/7<95siyLD ^GVGvrrLDSV &rfhunm ^snaSlsb 
<sh<5$)lduj G>GUGm®Li>. QurrgjGuna 0.1, 0.01, 
0.001 CTSOTffJ/ ^GVLDlLjLD. 



(SlD/D<*<5W/_ (VpGSrrU Ul _ d]&><olT}6£] LD J-sot LD&JuLI 

L/siraf) ss)<su^§i<sb<sn ^i—£$dl<sv ^(/jjddljDgl gigst 
<5Dsu<s^/<®(o)<g5frsyr(Ssu/7i£). ^uCHungj alpha-sw 

ld^Iuli Lfil&ddljfihuprra ^j(§rB^rrsv, Limsufl 
s^isu^B^ishm spsnQsurrq^ 

&Lfi/od)d(§Lb dljfilgi dljfilprrad (§GtniDdauuLL® 









global optimurn-®» u5)0/5<s Grprjih 

i3u\.d(g)Lb. ^gysOTSiy aLpjd^ai^Lb 

(S^snsLiuu®Li). spqjj (&)Lfii bsot<s £)sotsot6f £)sotsot 
= gyuj_ujf7‘<s ^LgGhuQlpgi snsuuugi Gumsv 

/5<950i£>. cgy^/Gsu uS)<35SiyLD <Si&i'5>LDrT'5> 

^(ijr 5 &nGvan-i—, J-sot ldS^Iu l/ global optimum- 

6150 L/5)66 c gi/0<g5/r<531LDU®SU Gll[F>an<SVILB On_l_, 
^igsmu i£l& $<srrLDrrai st®o,§] stnGuuupmsv, 

global optimum -^oil/ld ^rrsmig, (Usury sraiGarr 
Q&sirg 1/ siS)(pt£>. ^susurrQg) 
aLpp^las/rlsv STmiQaimQarT Qassrgy 

Gmmu£@iGS)m3 : Q&mrDGts)i—Uj£ &,GuqjjLb. stsstQsu 
alpha-®sr ld^lIlSsotsot 6f/D uj nm ^ sot si 5) si) 

(0)<95/7®<*<95 (Ssusm®LD. 

Alpha-6150 cgi/®650/ SpSuQsUIT(§ ^L/_fTSl5)sOT 
partial derivative asm®LS]Lgdauu®Sig>^]. 
Simple linear regression- si) srsiru§i 

<5Ssf)ujf7<35 ^)0<®0LD. @>lLl _/T— 1 STSOTL/0/ X-6iyi_6OT 



(3<y///5 &ii $](tijd(§Lb( h(x) - 0 + ^Lz_fr- 

lx ). partial derivative -<*<*nm ^wsirunQiLb 

£D<3<5 (LpstniouSlsv L3sZiGU(rj)L£>nrri] ^smAiL/LD. 


111 


jlj = -l £ (h(x)-y) 
000 m ,, 

( for 0 O ) 

111 

-L E (/.(!)-»)•! 

( for 0| ) 

t=l 



dLfidasmi— fflrjsSleb ld^Iu l/ rflss)sviurr a 

<snsud<3suuLl(^}, SjLh—nrl -sir ld^Ilil/ ldlIQlL 

gradient descent (LpsvjDuSlsv 

(gsm/Dd&uuQdljDgi. 50 &Lfirr)d)asrr 

/5 L_d5 dg u l/ (J) £) ssrro ssr. iSI sir ssr rj J_h i S tory sissy ld 

liSt-SlL) SpsijQsUIT(§ SrLp/DdluSl^ILD 

asmi—rfihuuuLLi— cost (S<?u5)<s<95l}ljl1(5) 
^^ls61(f^rB^i ssr jd suit ssr ld^)ul/ Gts/rsiy 
Gl^iLnuuu^dijD^]. £§),@sb' partial derivative- 



^,sst§] delta OTgrwLo u&>£&>rT<sb (gjfildau u ($)&>) jdqi. 
^DjS&st ld^Iuli i3mGuqj)Lb GustnauSlsb 
aGmd&U—uuQSljDgl. 


delta = 1/m . (h(x) - y).x 

= 1/m . (^LLfrl.x - y).x where h(x)= 
^/J./_frl.x 

= 1/m . x. ^Lz_rr 1.x - x.y 

https://gist.github.com/ 

nithvadurai87/43664cacd625e7c290c8812894 

dca659 







m = len(y) 


thetaG = 1 
thetal = 1.5 
alpha =0.01 

def cost_function(theta0,thetal): 

predicted_y = [theta0+ 
(thetal*l), theta0+(thetal*2), 
theta0+(thetal*3)] 
sum=0 

for i,j in 
zip(predicted_y,y): 

sum = sum+((i-j)**2) 

J = l/(2*m)*sum 
return (J) 

def gradientDescent(x, y, thetal, 
alpha): 

J_history = [] 
for i in range(50): 




for i, j in zip(x,y): 


delta=l/m*(i*i*thetal-i*j); 

thetal=thetal- 

alpha*delta; 


J_history.append(cost_function(th 
eta0,thetal)) 

print (min(J_history)) 


gradientDescent(x, y, thetal, 
alpha) 


QeueiflufG)): 


0.5995100694321308 




Gradient descent-su lL& ^Lprodiasn 

CTsi/snsyrsiy s ( 3 su <5337 (J)ld <sig$t 3 > QanQjz&j, 

0<s3i/Dsufr®srCOSt-gi Gts/rsiy 
Q&iuujGorTLb. GldgoiLD ^>l(Vldff,(h)dff, 
Q&>m—ij3 : §\ujrTm SrLprD&ansfilsb spQrj LDnS^irrhurrm 

cost ld £5) uz_/<s sir Qsus^uuQiSlpQ^ssflsb rsmb 
global optimum-g> si 5 ]l(Sl/tld sjssiiq] 

tSHij£&>Lb. a_<5/r/7®ssr<S^/<S0 300 (tp<ssi) 400 
suss)rjuSlsorrm arLproffiansfilsb J ld^Iuli, LSla i£la& 
0<53i /D[5& ^snQsn (SsijguuQ&liDQ&Gsflsv (<0.001) 
^0/ global optimum-go ^ssil/s^/ si5)zJ_z_0/ 
stsotCS/d ■sHij< 3 )&,Lb. £§)0/(3su Automatic 
convergence test stswi^ld ^&nLpd 3 ,uu(^SijD^i. 


9.5 Matrix 

UsbQsUQ] OT<S33T«S)T = gl/<S35f?SU0<S0/<f (o)<?SUSU0/ 

<sH®ssflasrr <oTGmju®LD. simple linear regression¬ 
s' 6^C3/7 S^{0 57(5337(531 (5337 (53157/1> hI l> 


QsuQrDrrqj) (oTsmstnsmd asvsfl^GiprTLb. ^mrreb 
^GtsflsuQfyLb multiple linear-si) spsiinjjd^LD 

(SldjdulLl- toTsmam spsir/Dnad (H&rjrB&ti (SeuQjorr(§ 
toTsmstnsmd asvsfldau (SunrdljD^l- ^^nrsu^i <sp(ij 
g^lLu^-gst g&jrj 6i5)sn/7<9><sT><5 ldlLQld GftiGiiggid 
QanrGm®, ^gug^lLi^gst G^GftisvGftnud aGftsfluugj 

simple linear srstsflsb, sp(§ g^lLu^gst 
<3IG&>jr)3i6]i?lm srsmsssfldsDai, ST^^sn&sr si / 0 z_ld 
uGinLp uj&j] (HumsiriD usvCSgut^ anrjGftftfl&Gsxsn 

GftiGupgidQarTGm® ^gug^lLi^gst sSlGftxsvGftnud 
asftsfl U ugi multiple linear ^(§ld. gtgstQgu 

<3)Gft>£u ujDjfild (J,f6uS)/r >0 (Lpssimij GpmrpjdfifjLt 

(SldJDUlLl _ <oTGftftT&>Gft)GfT GTG 1 ]GU HgU ^JGftsflGV(§UU&J, 

^Gfttfl GU 0<95<95 U Z_/L_L_ GTGSftT&GftiGfr GftlGUggJ 

GTGijGun'gu &Gftftrdd($\ 3 iGb Q&iLiGugj (HurrGftr/D s^0 

dlsv <shL£ j-LjuGft)i _ G&GtyujrhiaiGftiGrrd a/Dgud 

QarTGnGrr (SGUGm(^Lb. 


r^iGssfl: 



S£0 ^GVsfluShsV ST<5<S<531<53JTOWS TD/D/p/LD COllimnS 

ssbsn§] sts3tjj(S<$ ^[6& <s>]Gtfsflu51ff5T dimension 
srmuuQ ld. 2 rows ld/d^/lc 3 columns 

(o)<$fTS33T/_ A l3m<SU(f^LDrfQ] ^ <SS)LDlLjLb. £§)ji/ 

2*3 dimensional matrix <oT®sruu®Lb. (°£jr 5 & 
^GtfsfluSlsv ssbsn ld £ 5 ) ur/assi sir A -sir &l£ 

ST<$<$S31S3TUJfTSlT^/ TOW LDfdjTJJUj S7<5<5 S3) SSI ILI <7CL/0/ 

column-si; ^ldld^Iuli 3-<sbsn§i srmd (o)<$/t(J)<$<$ 
(SsUS 33 T(J)LD. a_<$/T/7S33T<$^/<$0 A22 sjmu§] 
^psmi—rTGugj row LDfdnjjuj ^fjsstsri_nsn§] 
column-si) a_ sir sir 5 stssi/lo LD^iui3ss)md 
(3jfl5)<S(3jLD. 


Matrix 

n 2 si 

[_4 5 6j (2 x 3)matna 



Multiple linear srsmig] suq^ldCSuit^i spqj) 
tSUemfluSlm dimension nairug/ <5/7517 <95 syf) sot 
STGmsvsfldstna LD/bgULb as^sfluu^jb^ 
srQb&gjd QarrmtQrijLb ^LD&iEiam ^QiuGij/bGvjDLj 

Qurrgn^^i <sy®sHDiLiLi). ^gnreugj. 

rows = no. of records 
columns = no. of features 
Qeu&i—ij'. 

spQrj spQjj column-«p<s Qarnsmi— ^svsfl Qsudi—ij 
CTSW/py ^SS)Lg)dayUuQ\Lb. L5)sOT6U0LD/T/p/. 
Qeudi—flsv 3-sbsn LD^IUL/as^vsn ^i smi a 

sid,pemesriurr surow sissitry ldlI^ld 
(o)«fr(J)<5<5/7<si) QunajiLDnm&ii. B 3 stsotl/jj 
^Lpmpnsu^] row-si) ssbsn LD^luunm 38 
STmu 6 $)&>d (§rfild(§L£>. spqj) Qeudistnrj 0 -indexed 



LD/Dgi/LD 1-indexed stsvild ^)® sustnaas/rlisv 
(&)f$i&3i6orTLb. B3 toTsiiu§] 1-indexed srsst)sb 38- 
gouyLD, O-indexed srGsfleb 47-goiLjLb (& ) r$ld(& ) Lb. 



Vector 

B = 

15' 

20 



38 



.47. 

4 - dimensionvector 

B 3 = 38 (1 — indexed) 


^>iessflaeiflssT s^Llisb: 

)fT<sm( ^Gtstflas/rlsir dimension &ldld[t&> 
£§)0/5;5/T6il> ldlL®(3ld ^sijSiSlijsmQ) 

^GvsflasmsmLiLD dimension 

Qamsmi— mpQpfr^ ^Gtsflsmu 2-(njGurrda 

( LpL^lL/LC). 


If (3*2) * (3*2) = 3*2 






1+7 


8+2 


3+9 

4+10 

5+11 

6+12 


Matrix Addition 


2 


7 

8 


8 

10 

4 

+ 

9 

10 

= 

12 

14 

6 . 


.11 

12 . 


. 16 

18 

(3 x 2) 

(3 

x 2) 


(3 


^cssflggyflcgr Qumd&eb: 

(Lp^svrrsu^i column 

^ijsssri—rrsn^i ^sisrfhiSlm row ^^lus^suans^m 
GTsmsttfldstna ^LDLorra ^(jTjrspmsv ldl1(^)(Sld 
^ shs^jsssrQ} <s>)Gvsfla<5tnGmLiLD Quq^ddl 

ldjdQjd^T(§ <3iGtstflsmu 3-qjjGundai (Lpuj-iLiLD. 
L-i&)}&)iT&> Quqjjdd) 2 -Qj)(oLirrd&uu lLl _ ^s^uSlm 

dimension-^swjy, (tp<ssu/rsu^/ ^stssfluSlstr rows 










LD/DguLD ^usmi—rrsiigj ^svsfluSlsir columns 
LD^iui3ss)mu QujDj&(§d(^Lb. 

If (3*2) * (2*2) = 3*2 

(Lp&>eonGU3)i <s>)6tpsflu51<sv ssbsrr row-sw LD^IULi^m 

^pGmi—msijgj ^svsfluSlsv s-sbsrr column-sw 
LD^)/jL/<*(S)5i_<s3r £Gvfl££evfhurrau 

G)u(rjj&a,uu($\Lb. i3sirmij ^uQuq^dans^rlm 

LD^ULiam spmprraid a^LLi—uuQ&imrDm. 
^)si/si//t(S/d ^svsflas/rlsir Quq^dasb 

rBstni—Qugu&lfDgil. 

(l*7)+(2*9) (1*8)+(2*10) = 

7+18 8+20 


(3*7)+(4*9) 

21+36 


(3*8)+(4*10) = 
24+40 



(5*7)+(6*9) 

35+54 


(5*8)+(6*10) = 
40+60 


Matrix Multiplication 


1 2 

3 4 

x[ 7 8 1= 

‘7 + 18 8 + 20' 

21+36 24 + 40 


'25 

57 

GO ^ 

5 6. 

Ls 10 J 

.35 + 54 40+60. 


.89 

100. 


(3 X 2) (2 x 2) (3 X 2) 


^lessflaetiftesr transpose: 


^GvsfluSlsv s-sbsrr rows mpa^Lb 
columns-^* LDfTjDpuu(^su(Sp <snr5&> ^svsfluSlsir 
transpose srswu uQld. 


Matrix Transpose 


A = 

1 

3 

2' 

4 

A*=\ l 3 *1 


.5 

6. 

L2 4 6 J 












Inverse: 


s £0 ^GtstfluSlsir inverse <ojmu§] &rf>Qro au^mLDrrm 
(Lpsm/DuSlsv asmd&>U—Lju(j}Li>. 2*2 dimension 
Glarrsmi— ^sssfluSIssi inverse L 5 )<sOT 6 U 0 i£>[r/p/ 
aGmd&U—uuQLD. All, A22 LD^luLiansuflsir 
Qumdaeoidc^LD A12. A21 LD^luLiasuflm 

Qumdasvidc&Lb ssbsn aSl^^iiui^Lcrrm^i 1 -sir 
S>Lp 6ft)LDfB&il GU(§daUU®LC>. ^DjSSOT 

G)&>m rfdld)ujrr&> ^svsfJuSlsv sshsrr All, A22 

LD^)LyL/d5S)T ^)/_LD/T/D/DLD Q&UJILIUUlL® LD, A12, 

A21 LD^IULI^sh <oT$dlrj LDSTIJDuSlsV LDrTJDJDUULlQLb 
Qu(fjjdauu®LL. 


Matrix Inverse 






Identity Matrix: 


spQrj loTsm&ssfldsmauSJ <su nm rows LDfonjjLD 
columns-g><® Qarnsmi— ^s^sflCSiu ^s^sfl 

ioTmiju($\Lb. <sp(ij tfffjrj ^embuSlm 

(^p<sy>6U6i5)z_!_z_<5|5)<a> ldlLQlL 1 otsot £§)0/56y ro/rj/D 
^i—imasfilsv STshscmb Lj®o®£huLD srm £§)®/5;5/r<si> 

su Identity Matrix sjmuuQiLb. 

^s^sfliL/Lb, csy/5^ tsyGmfluSlGVT inverse -ld Gt&rjrsgii 
Identity matrix-g> s^q^sund^Lb. 


1= 


Identity Matrix 

1 0 0 

0 1 0 

0 0 1 





9.6 Multiple Linear Algorithm 


spsirgud^LCi G>ldt)ulLl _ ^Lb&thiansb spmrQnaid 
(S&rjrB&j sp(ij ad sty iud,&s)g)d asvsflddljD^l ’srstflsb 

<sh§]Ggu multiple linear regression <ot&stuu®ld. 
SpsijQGUtTQJ) < 3 jLD&(ipLD Xl,x 2 ,x 3 .. STS 57<95 
Qamsmi—msv, ^pjoarrstr &LDmurr($\ 
L 5 )®STSU 0 LD/Tffi/ ^SS)LDlLjLD. 


+ 

II 

+ 0 2 x 2 + • 

• • + 0 n x n 

— 0oZo + 01 . 

1 ’! + 9 2 x 2 + ' 

■ -| -0 n X n (added Xq = 1) 

= e T x 




multiple linear-si) spsijQeumr^ feature-<®0LD s^0 
ld^IuL j anGmuuQCSLD <ssi 5 Irj, no.of rows 
-®ou Qunrryd,^ ld nronsj. simGsu d,iGi_n stsotl / 0 / 
toTLlGiurTgJLD 1 TOW-SI) UsbGsUQ] LC)d)lLJL-l&><ofT 



<s>]Gtr>LDrB&)i<srrm <s>f6vsflujrra ^(§d(§LD. i3mmij 
<s>iG& sflGtuu transpose Q&iugj 1 column-si) 
usvQsugu LD^iunaisb ^ss)LDrB§j<sbm Qsudi—rjrrai 
ldvjd/dgv mb. srmGsn &,rrm transpose 
Q&iliujuulLl _ {BL-Li—n^sttflsmuiLiLi), features- 
darrm X ^ststflsmuiLiLb Quq^dSimrrsb multiple 

linear-<9s<s/r<s37 &LDmurT($\ su/5^/si5)®ffl/n^/. 

£|)/5<5 6FLDS3TLJ/TL.tiJ.si) ^Lil—fjO -6iyi_637X0 STgnj LD 

feature spsirgu Gvrjdff.uuf^Sifogj. £g)g/ 
toTudiurrgiLD 1 srgn/LO ld^Iuss)uGuj 
Q urorSlq^d^tb. $}[5& l]§>1ilj feature-^si) {blGi—hO 

lc^IulSIsv st/ 55> s^0 L£>rrjojD(LpLL ^riDui—ngj. 
QsujQlLD ^svsflasuflsir Qu<jj)d&<5£ld(§ 
gjstnsmLnjliLiLb susaiauSsi) 

<3<977<S<S uulL (J) srrrngj. 


dLfidasmi— a_<s/ 7 / 7 ®ss 7 < 5 ^)si), 



800 &&Jp cgyuj-, 2 15 su 0 /_ uGsiLpiu 

sfiLiuj-sir giOIgsigv = 3000000 

1200 &3)1 p L£)-, 3 ^SS)p)3i<sb, 1 SU0/_ UGSILfilU 

si ^lLi^gst giOIgsigv = 2000000 

2400 &§irr ls)-, 5 = gys 3 i/D<s 6 yr, 5 su 0 /_ uGsiLpiu 

s^Liu^m ®S)gs)sv = 3500000 

gigs)ill 3 <s/ 7 Siy<ssvrX gigsiill ^sssfluSlsv 

G)aiiT($\&3iuuLL($\<sbGrrm. ^siisurrCSjo 100 , 1000 , 

10000, 100000 ^Sliu ld&)IUL- i&sir £blLl _ n0 , 

(5LLi—rrl, {blLi—it2, {blLi—It3 -sir LDtBlULi&mrra 
{blLi—IT gigs)ill ^sssfluSlsv Qarr®d&uuL-Q\<sb<snm. 
^jgsigu ^]usm(^Lb (HLD/D&Gmi— ^LDmunLLu^m 

ULSj. (o)l//70<5<5l}z_/L_!_®, h(x) ^sssflssnu 

a_ 0 gu id 0 Si gstjd ssr. 



https://gist.github.com/ 

nithvadurai87/5abf51e4b26717a3427dl5fcaca 

6f48f 


import matplotlib.pyplot as pit 
import numpy as np 

x = np.array([[1, 800, 2, 15],[1, 
1200, 3, 1],[1, 2400, 5, 5]]) 

y = 

np.array([3000000,2000000,3500000 

]) 

theta = np.array([100, 1000, 
10000, 100000]) 

predicted_y = 
x.dot(theta.transpose()) 
print (predicted_y) 

m = y.size 






diff = predicted_y - y 
squares = np.square(diff) 
#sum_of_squares = 5424168464 
sum_of_squares = np.sum(squares) 
cost_fn = l/(2*m)*sum_of_squares 
print (diff) 
print (squares) 
print (sum_of_squares) 
print (cost_fn) 


QeueiflidOi): 

[2320100 1330100 2950100] 

[-679900 -669900 -549900] 

[462264010000 448766010000 
302390010000] 


1213420030000 





202236671666.66666 


ansssrdGZKti tff&jprhai oSI&ld: 







100 



1 

800 

2 

15 


1000 


2320100 

1 

1200 

3 

1 

* 

= 

1330100 

10000 

1 

2400 

5 

5 



2950100 






. 100000 




Cost function: 


J =3kS(M* w )-» (0 ) a 

i=l 


simple linear-go spdjQg, £|) 0 < 950 i£>. ^ssrmsb 

h(x) asmdS}(^)Li uLD ldlLQlC) mnrg]]u(^Lb. 

Gradient descent: 

^)^ysiy ld simple linear-®» <$£(%&, £|) 0 <s 0 i£>. 
^ssrrrsv simple linear-si) ^L/_a 0 ld^Iu l/ 

(g)SS)rr)daiijuQ\sijg,rr)airtm &LDmunu\-<sb X si situ if,] 












^(rijdarrgj. ^ssrrrsb £|)/5j(§ ^ll/iO -siyLswxO 

(3<y/7<*«uuLi£0ULJ ; 5fr®L>, ^ss)ssr^§! S^lLi—IT 
LDSilULiam (3)G$)rD&&uu($\Gi]&,!D3irTm &LDsirurT($\Lb 
lSJ sirsu(7^Ldrriru spQrr Lcrr^t^hurrai^rrsir ^q^d^rb. 


Oo : 

: 

where as, 

— J : 
«»<> *' 


0o-a^J 


-G (ll(x)— y y X (for all f) ) 

"* t=l 


<&£G?l!j£&ileir Qpeoib minimum cost 
aismi—ffilgsb: 

Gradient descent-gnj utusiruQgQisugfn^ 

uGZisorra l5)®stsij0ld ^LDsirumbi^sir g tpsvLD 
(SrBUisj-Ujnr3i rsmb minimum cost-go sjrbu 
<s^uj.iu ^L(_fT<55)su arsssrQ\i3i^dar (tpifuyLD. 
^ssrrrsb features-sw srsmstfsfldstna ^^laiLDrrai 



£§)0/5<5/7<a>, gradient descent-«»u 

UILISSIU $3,3)1 Sl](%3> &)/D[5&3)l- §tQ SStesfl si) Ql£>3^3, 

features-<®0 ld ^gj/oiLiu transpose 

3 ssst$i 3 i^uu§i u 5)0/5 <s (Stsrr aSltjujib 

Q&tLiujd&n.is)-UJ£rT& ^stmmLjLb. 


0 = {X T X)- 1 X T y 


Feature Scaling: 

£§)/£/gj spsiiQsurr(ft) feature-ti) GsusitCSsui^i 
tSUGtrs&eorrm srsm su ifl&n33stflso 
<S>lS$)LD[5G>l(rj)ljusS)3>3 3SU&sfld3SllLD. 
s-3,3rjsm^§id(3) 33jij ^114. srssr sr$^§]d 
QatTsmi—rTsv ^sysnsn 800 (Lp^sv 1200 

SUSSmiDiSVILD. <SHS$)rD3sfi}siT STGmGVsfldGV)3 ST&ST 

ST$£gidQ3rrGmi_rTGV <sns$)su 2 (Lp^sv 5 

su sin rruSJ wild urjs^iLjsbsrr^i. 



800, 1200, 2400 


^stnrD&srr - 2, 3, 5 

£g)si/ Gungu speuQsurr(§ column-si; ssbsrr 
LD§duLi&tQnjLb Qsusij(Ssugu GTsm su/^ssiffasyfiai) 
£|) sbeom£>sb , ^ssxssr^gjLb -1 sSlq^tBgj +1 Gustnrr 
'Sysvsvgj 0 <s 61 (fj)r 5 &)] 1 si/<s3i/7CT<53T normalize 
Q&iuGijGlg feature scaling <srmuu($\Lb. £§)4>/0(g 
a-jSsiysuG;® mean normalization <s&(&,ld. 

<^&>rf)3irTm (s^^^irjib i3msu(f^LDrTiQ]. 

particular value - mean of all values 


maximum - minimum 




= (800 —1600)/(2400—800), (1200- 
1600)7(2400-800), (2400-1600)7(2400-800) 


= -0.5,0.25, 0.5 

^6njDsdr = (2-3.5)/(5—2), (3-3.5)/(5-2), (5- 
3.5)/(5—2) 


= -0.5 , -0.16, 0.5 


Qurrmro multiple linear-si) gradient 
descent-®p(j uujmu($\£{f ) iLb < 3urT& ) i spsuQsurr(§ 
feature-LD spshQsurrq^ ^gysvrsiy sn/DsaidFasyfisi) 
^) 0 LyL/< 5 / 7 si) plot-^sawy uS)« u 5 )«<® 0/p/®uj 

cgysvrsiy suLL/arasaisyr QrsQijda (o)/50<s<SLDf7<s 
tsjrou($}£&,! ld. (oTssrQsn stnLDUJ^^isom 
Q&mjDGtni—Uj LfilasyLD &)rjLDuu®Lb. 

normalize Q&iliujuulL® 



6L/L_!_Z_/5i<S(S)3LD Sp(S[T ^STTsSlsV ^)0/6<5/7SU 
snLDUJ^^issxssr Q&Gvr/DGtni—iu snff^uurrai ^(fjjd(§LD. 



10. Pandas 


Pandas msiru^i rflaLpamsvp aj/rsiy assxsn ^ svni Q. 

tSgSvd) rBLDdGtajDfDGUngu 61/UJ_ 6U 631 LD/_}/_/<5/D0 

python GULpmi^&imrD spqij library ^^ld. £D<s<sot 
g tpsvLb CSV, txt, json Gurrmro usvGsurrg 

SUUj-SUfEI&iolflsV ^)0<®0LD (gp<sv£ <S/7Siy<95(531S)T 

stQ\^§] <sp<iT) dataframe-^,* ld nrorfil 
rBLDdGtaiijrDGUngy jS/jsiya&nsyr gaGus&LD^&dd 
Qarrsbsrr (LpigiL/LD. 

firing) [ 5 mb ungdanj Gurr^ib gsm^^lsv sp(§ 
s^lLi^sst sfilrDusinm 6i5)<sr> sussiuj rffdsmuSluu^iD^ 

a_<ssiy ld usi) (Usury arrpGtfsflatQnjLb, ^sgg,situig. 
rflrjGmuSldauuLh— 6i5)<sn<su<s {05 ld CSV G&nuurrai 
QanQdanjuLLQsbsrrm. ^jgjGsu training data 



srmu uQld. £D<s3i<s G$)Gu£& ) i£& > rTG$r pm! spqj) 
model-g> 2_0SU/T<*<g5LjGLJfTffl(S/D/TLD. 

( Lp^&Slsv model-go a_ 0 sufr<® 0 su<s/D 0 (Lpsmsmrf 
(^j[h3) training data-go ft, ml. Liflrs^i Qansbsn 
(Ssusm(^LD. $}6>l<zv sr£&,G$)m <s/7siy«srr ssbsrrm, 

OT< 5 < 5 < 531<537 null LDShuLiam s-sbsnm, 
ST&nsnQujsbscmb <si5)/d/_/<s3i<s 37 <si5)<s3i<su <531 iu 

urr^idaida^u\.uj (Lpddhud arrijsis^aisb, 
(S<5<53isi/u®susu/7<5 ^)<537<s3r lSI/d arrrjGvsfla, <531 sir 

loTSlJSUrTTU /§<S06U0/, Null L£>£5)/_}(_/<*<531 Sir CTsi/SUff/p/ 
^ ld <®0 (Ssusmi^iu LDtdlULiammsv LDrriijffil 
^snLDuugj (SunrmjDsujDs^ipQujsvsvrTLD Pandas 

g LpsvLD pm! Q&uj&j urrrjdau(l>urr&>)(l>ir)rTLb. 

g) 0 /G<su preprocessing / feature selection 

GTsmjuQLD. £|)<5;D<sf7<s37 rfjlrrev LllsirsuQ^LDnfgu. 



https://gist.github.com/ 

mthvadurai87/5fd84f40ce26eac65a8060ee2dl 

5280a 


import pandas as pd 

# data can be downloaded from the 
url: 

https://www.kaggle.com/vikrishnan 
/boston-house-prices 
df = pd.read_csv('data.csv') 
target='SalePrice' 

# Understanding data 
print (df.shape) 






print (df.columns) 
print(df.head(5)) 
print(df.info()) 
print(df.describe()) 
print(df.groupby('LotShape 1 ).size 
0 ) 

# Dropping null value columns 
which cross the threshold 

a = df.isnull().sum() 
print (a) 

b = a[a>(0.05*len(a))] 

print (b) 

df = df.drop(b.index, axis=l) 
print (df.shape) 

# Replacing null value columns 
(text) with most used value 

al = 

df.select_dtypes(include=['object 
']).isnull().sum() 




print (al) 
print (al.index) 
for i in al.index: 
bl = 

df[i].value_counts().index.tolist 

0 

print (bl) 

df[i] = df[i].fillna(bl[0]) 

# Replacing null value columns 
(int, float) with most used value 
a2 = 

df.select_dtypes(include=['intege 
r', 'float 1 ]).isnull().sum() 
print (a2) 

b2 = a2[a2!=0].index 
print (b2) 
df = 

df.fillna(df[b2].mode().to_dict(o 
rient='records')[0]) 




# Creating new columns from 
existing columns 

print (df.shape) 
a3 = df['YrSold'] - 
df['YearBuilt 1 ] 
b3 = df['YrSold'] - 
df['YearRemodAdd'] 
df['Years Before Sale'] = a3 
df['Years Since Remod'] = b3 
print (df.shape) 

# Dropping unwanted columns 
df = df.drop(["Id", "MoSold", 
"SaleCondition", "SaleType", 
"YearBuilt", "YearRemodAdd"], 
axis=l) 

print (df.shape) 

# Dropping columns which has 
correlation with target less than 
threshold 




X = 

df.select_dtypes(include=['intege 
r', 'float 1 ]).corr()[target].abs() 
print (x) 

df=df.drop(x[x<0.4].index, 
axis=l) 

print (df.shape) 

# Checking for the necessary 
features after dropping some 
columns 

11 = ["PID","MS Subclass","MS 
Zoning","Street","Alley","Land 
Contour","Lot 

Config","Neighborhood","Condition 
1","Condition 2","Bldg 
Type","House Style","Roof 
Style","Roof Matl","Exterior 
1st","Exterior 2nd","Mas Vnr 
Type","Foundation","Heating","Cen 




tral Air","Garage Type","Misc 
Feature","Sale Type","Sale 
Condition"] 

12 = [] 
for i in 11: 

if i in df.columns: 

12.append(i) 

# Getting rid of nominal columns 
with too many unique values 
for i in 12: 

len(df[i].unique())>10 
df=df.drop(i, axis=l) 
print (df.columns) 

df.to_csv( 1 training_data.csv',ind 
ex=False) 


id&rresr aSh 









csv-si) a_6 rrsrr ^rjsijasfr df ctctij ld data fra me-* 0 dr 
pandas qpgold sjjDjDuuLl^dTm^i. $]@>i<zv 
CT<5<5<S31 sot rows LD/D/p/ro columns s-drsrrgj 

srmu6$)&> L5)<S3r<SU0LDf77p/ ^JT^lUSVrTLD. 



i3msu qj)LD <5L./_<s3i<syr ’orsirQmmm columns 
ssrrm^i ct<s3Tl/<s3i<5 Qsus/rluuC^^^iLb. 


print (df.columns) 



Index(['Id', 'MSSubClass', 
'MSZoning', 'LotFrontage', 
'LotArea', 'Street 1 , 

'Alley', 'LotShape', 

'LandContour', 'Utilities', 

'LotConfig', 

'LandSlope', 'Neighborhood', 
'Conditionl', 'Condition2', 

'BldgType', 

'HouseStyle', 'OverallQual', 

'OverallCond', 'YearBuilt', 

'YearRemodAdd', 

'RoofStyle', 'RoofMatl', 

'Exteriorlst', 'Exterior2nd ', 
'MasVnrType', 

'MasVnrArea', 'ExterQual', 
'ExterCond', 'Foundation', 

'BsmtQual', 

'BsmtCond', 'BsmtExposure', 

'BsmtFinTypel', 'BsmtFinSFl 1 , 




BsmtFinType2', 1 BsmtFinSF2', 
BsmtUnfSF 1 , 'TotalBsmtSF', 
Heating', 

HeatingQC', 'CentralAir', 
Electrical', 'IstFirSF 1 , 
2ndFlrSF 1 , 

LowQualFinSF 1 , 'GrLivArea', 
BsmtFullBath', 1 BsmtHalfBath 
FullBath', 

HalfBath', 1 BedroomAbvGr', 
KitchenAbvGr', 1 KitchenQual 1 
TotRmsAbvGrd', 'Functional 1 , 
Fireplaces', 'FireplaceQu', 
GarageType', 

GarageYrBlt', 'GarageFinish ' 
GarageCars', 'GarageArea', 
GarageQual', 

GarageCond', 'PavedDrive', 




1 EnclosedPorch ', '3SsnPorch', 

1 ScreenPorch', 'PoolArea', 

'PoolQC', 

'Fence', 'MiscFeature', 
'MiscVal', 'MoSold', 'YrSold', 
'SaleType', 

'SaleCondition', 'SalePrice'], 
dtype='object' ) 


head(5) (Lppsv 5 js/rsiyasaisyr Qsusufluut^^^iLb. 


print(df.head(5)) 

Id MSSubClass MSZoning ... 
SaleType SaleCondition SalePrice 
0 1 60 RL ... WD Normal 208500 

1 2 20 RL ... WD Normal 181500 

2 3 60 RL ... WD Normal 223500 




3 4 70 RL ... WD Abnorml 140000 

4 5 60 RL ... WD Normal 250000 
[5 rows x 81 columns] 


info() rBLDgj dataframe-sw ^/sdldul/ u/Dffihu 
aSlsnrjmiaisisxsn Qsusifluu ld. 


print(df.info()) 

&lt;class 

'pandas.core.frame.DataFrame'&gt; 
Rangelndex: 1460 entries, 0 to 
1459 

Data columns (total 81 columns): 
Id 1460 non-null int64 




MSSubClass 1460 non-null int64 


SaleCondition 1460 non-null 
object 

SalePrice 1460 non-null int64 
dtypes: float64(3), int64(35), 
object(43) 

memory usage: 924.0+ KB 
None 


describe() sp(§dUsv (Lpddhuu L/svra fhiShusv 

aS]surj[hi3yS<s)snd asmdQaQ^gj QGUGifluuQpgjLD. 


print(df.describe()) 

Id MSSubClass ... YrSold 
SalePrice 





count 1460.000000 1460.000000 
1460.000000 1460.000000 
mean 730.500000 56.897260 ... 
2007.815753 180921.195890 
std 421.610009 42.300571 ... 
1.328095 79442.502883 
min 1.000000 20.000000 ... 
2006.000000 34900.000000 
25% 365.750000 20.000000 ... 
2007.000000 29975.000000 
50% 730.500000 50.000000 ... 
2008.000000 163000.000000 
75% 1095.250000 70.000000 ... 
2009.000000 214000.000000 
max 1460.000000 190.000000 .. 
2010.000000 755000.000000 
[8 rows x 38 columns] 


groupby() spqj) column-si) ssbsn LD^uLiastvsn 

GUGS)3iUU($\£$£l QsUSuflUUt^^^lLb. 






speuQsurr(§ column-su ;ld a_syrsyrnull 
LD^iUL-iansfilm toTsmsttfldstnaGmu 


Qsijisfillju ($}£&,] LD. 


print (a) 


Id 0 



MSSubClass 0 
MSZoning 0 
LotFrontage 259 
LotArea 0 
Street 0 
Alley 1369 
LotShape 0 
LandContour 0 
Utilities 0 


PoolQC 1453 
Fence 1179 
MiscFeature 1406 
MiscVal 0 
MoSold 0 
YrSold 0 
SaleType 0 
SaleCondition 0 
SalePrice 0 

Length: 81, dtype: int64 





0.05 lormu^i Null-<*<s/ 7 <s 3 r threshold ^^ld. 
cgi/^/rs ugi 100 <*0 5 null LD^iunaush ^q^daisomb 
srm suss)ijuj!r)]d3>uuLbQ\<sbsn§i. srmGsij 
si5)z_ ^/^)<95 ^gi/svray null LD^)LyL/<®syr (o)<s/TS3an_ 

columns ansmi—bHujuuLL® 
Qeu<5T?luu(j}£gLju($)&>)iB&)i. i3mmij £|)sy>su 
dataframe-si5)0^0/ fid&uuQdlsirfDGvr. 




BsmtQual 37 
BsmtCond 37 
BsmtExposure 38 
BsmtFinTypel 37 
BsmtFinType2 38 
FireplaceQu 690 
GarageType 81 
GarageYrBlt 81 
GarageFinish 81 
GarageQual 81 
GarageCond 81 
PoolQC 1453 
Fence 1179 
MiscFeature 406 
dtype: int64 


G>LD/D&Gmi— 18 columns-spii/LD fiddhu 
81 Grmugj 6 3-^ad (^s^iprs^iehm^s^i^d 


atTsmwrrLD. 





print (df.shape) 

(1460, 63) 

^/(5)<5<5<s/T<g5 Threshold-gp si5)z_<® (^soinsu rnssi null 
LD^UL/asDsrnj Qu/Drymm text column-^ssr^/ 
QmsifluuQ^uu®§>111)3)]. include=['object'] 
loTsiru^i text column-g><s (&jrfild(3jLb. 


print (al) 

MSZoning 0 
Street 0 
LotShape 0 


Electrical 1 





KitchenQual 0 
Functional 0 
PavedDrive 0 
SaleType 0 
SaleCondition 0 
dtype: int64 

print (al.index) 

Index(['MSZoning', 'Street 1 , 
'LotShape 1 , 'LandContour', 

1 Utilities', 

'LotConfig', 'LandSlope 1 , 
'Neighborhood', 'Conditionl', 
'Condition2', 

'BldgType', 'HouseStyle', 
'RoofStyle', 'RoofMatl', 

'Exteriorlst', 

'Exterior2nd ', 'ExterQual', 
'ExterCond', 'Foundation', 

'Heating', 




'HeatingQC', ' 

CentralAir', 

'Electrical 1 , 

1 KitchenQual 1 , 

'Functional 1 , 


'PavedDrive 1 , 

1 SaleType', 

'SaleCondition 


dtype='object' 

) 


^ //5<5 columns-si) asbsn GpeuQeurrQjj ldSDul/ld 
sr£&,GS)m (Lpstnto gift I—lLQ uro fry siren g/ oasrugj 

asmi—rfihuuuLl® ^gysnsu spqjj list-^,* 

LDrriDpuu®£)sirpm. list-sw (Lppsvrreugj ld^Iul/ 

<_gi/£5)<s ^gyavrsiy <^i—LbQuri)tr)]<sb<5n eurrrjpstnp ^0 ld . 
^shsunij^ssr^uSlssmsb .s/rsw null ld^IuL/ aerr 
f§}rjuuuuQ\S}mp)m. 




print (bl) 


[ ' RL ', 'RM', 'FV, 

'RH 1 , ’C 

(all)'] 


['Pave 1 , 'Grvl'] 


['Reg', 'IR1', 'IR2 

’, 1 IR3 ’ ] 

[ ' Y', ' N ', ' P ' ] 


['WD', 'New', 'COD' 

, 'ConLD 1 , 

'ConLI', 'ConLw', ' 

CWD 1 , 'Oth', 

'Con'] 


['Normal', 'Partial 

', 'Abnorml', 

'Family', 'Alloca', 

'Adj Land'] 


^/(5)<5<5<s/7<g5 Threshold-gp si5)/_<® f&jsminei] rnssi null 


LD^iuLias^imu Q ufd rry srrsn n u m erica! column- 
/^)<® ^svrsiy ^ji—LbQujogH'sh'sn 
LD$£iui3mrrsb rffu’uuuut^SljD^i. 






include=['integer','float'] srmu^i numerical 
columns-«p<s (grfildigLb. 


print (a2) 

Id 0 

MSSubClass 0 
LotArea 0 


MoSold 0 
YrSold 0 
SalePrice 0 
dtype: int64 




print (b2) 

Index([], dtype='object') 


= gl/(/));9><5;5/7‘<S £|)/JSW® COllimn-SU SSb<SfT 

LD$ULi&G$)6n <spui3LL($\, ®r>si/<95 eiflsir 

sfilp^iurT&LC) asmi—ifliuLjuLl® spq^ l column- 
dataframe-si) ^stnsmdauuQ&lingi. 63 
columns-^* ssbm&,i columns ^)<s3i<sw/6<s 
l5)<sot65 ctsot LDrTf$luSl(rfyLiuG$)&>d arrsmsvrrLD. 


print (df.shape) 
(1460, 63) 

print (df.shape) 




(1460, 65) 


< 3 < 5 <SDS 17 u®susu/ 7 <s GpqjjdUsv column-sw Quiurfam 
(UrBULq-iLirrad QarrQdauuLl® ^sdsu dataframe- 
<sv £§)0f5#i/ $da,uu($\&\mrDm. l5)sot59 stsot 
ld ™ 5 ) u 5 ) 0 u /-/ sd < 5 <* airrsmscmb. 



numerical columns-«0/i, target columns- 
<®0LDfrsOT correlation asmi—iflujuuLl® 

G) si/syf) lj u (/>)#>#> Uu(j))d) jd^j • £§)<ss3r ld^ul/ 0.4 




67gn;fi; threashold-®p si5)z_ gjssi/Dsufra <g£iqjjui3m 
cSysnsi/ dataframe-siS)®/jay $daiuu($\&}mrDm. 


print (x) 

MSSubClass 0.084284 
LotFrontage 0.351799 
LotArea 0.263843 


SalePrice 1.000000 
Years Before Sale 0.523350 
Years Since Remod 0.509079 
Name: SalePrice, dtype: float64 


(Sldjb^ji^uj LDrrrororhiSysh ^ss)m^§,]Lb rffaLprs^ 

i3m, (H^GtnsLiujnm spQjj&Usv (LpdShu 

si5)si£tu/ii<ssYrdataframe-6u (g)sOTgn ild ssbm^rr 

STSOTL/dy (S<J/7^)<®<95LyLJ(5)®/D^/. = gi/S)TSiy<*0 

< 3 l$dlaLDnfGxr pssfluuLLi— LD^uL/asaisna Qarrsmi _ 





columns fidauuQdlsiriDGtsr. ^)<sdsivu^ld 
$da,uuLLi—i3m columns Grsmsvsfldstna 38 srm 

LDrTtfilu5l(njuuG<s)&,d arrsmsvrrLb. 



i 3 mmij cgy&nsu <5775(0)^/5^5 columns sjssi 
Qsusifluu(^^uu(^d}mjDs^r. 


print (df.columns) 



Index(['MSZoning', 'LotShape', 

'LandContour', 'Utilities', 

'LotConfig', 

'LandSlope', 'Conditionl', 

'Condition2', 'BldgType', 

1 HouseStyle', 

1 OverallQual 1 , 'RoofStyle', 
'RoofMatl 1 , 'Exteriorlst', 

1 Exterior2nd', 

'ExterQual', 'ExterCond', 

'TotalBsmtSF 1 , 'HeatingQC', 

'CentralAir', 

'Electrical', 'lstFlrSF', 
'GrLivArea', 'FullBath', 

'KitchenQual', 

'TotRmsAbvGrd', 'Functional', 
'Fireplaces', 'GarageCars', 
'GarageArea', 

'PavedDrive', 'SalePrice', 'Years 
Before Sale', 'Years Since 
Remod'], 




dtype='object 1 ) 


6B(5^r)/ <&hu rr&> dataframe-su ^)0<®0 ld 

LD^iuLiamrTGtsrgi training_data stotj ild Quiuflsv 
.CSV (SarruLmai Q&LEl&aiuuQ&lmrDm. ^^(Sen 
model-sir 2_(ji)rsurrda,ff>§?if6(& ) a_syrsyf/_/7<95 

^stildhild. <sdsv<s^/ model-®» 

a_0SVf7<950SV^/ <oTLlULQ. CTSW/p/ U(§§ilu51<SV 


atTGmwnLD. 





11. Model file handling 


11.1 Model Creation 

sklearn (sk for scikit) srdru§i python-si) ashm 
^lurB^ifjsuL^ld apjnevidarTGtT sp(§ library ^^ld. 
£§)!§)si) classification, regression ^Siiu 
snsDaasrflsOT Srp ^s^ildilild linear, ensemble, 
neural networks Qumsirp ^ss)mp§] 
model-«0LD algorithms 
s,rrsmuu(^Lb. ^)^)siS)0/5^/LinearRegression 
sTgri/Lp algorithm-go st(5)<s^/ ^tf>f 0(3 [BLD(Lps^>i—iu 
data-snsufj uppl rr,mli appjp 
^(§Sl(Spmb. ^pparrm rflrjsv i 3 sitsu(pLcnp]. 



https://gist.github.com/ 

nithvadurai87/91e74160ccb4ff51eef3188372a 

78b91 


import pandas as pd 

from sklearn.linear_model import 

LinearRegression 

from sklearn.model_selection 

import 

train_test_split,cross_val_score 
from sklearn.externals import 
joblib 

from sklearn.metrics import 

mean_squared_error 

import matplotlib.pyplot as pit 

from math import sqrt 

import os 






df = 

pd.read_csv('./training_data.csv' 

) 

i = list(df.columns.values) 
i.pop(i.index('SalePrice')) 
df0 = df[i+['SalePrice']] 
df = 

df0.select_dtypes(include=['integ 
er','float']) 
print (df.columns) 

X = df[list(df.columns)[:-1]] 
y = df['SalePrice'] 

X_train, X__test, y_train, y_test 
= train_test_split(X, y) 
regressor = LinearRegression() 
regressor.fit(X_train, y_train) 

y_predictions = 
regressor.predict(X_test) 




meanSquaredError=mean_squarecLerr 
or(y_test, y_predictions) 
rootMeanSquaredError = 
sqrt(meanSquaredError) 

print("Number of 

predictions:",len(y_predictions)) 

print("Mean Squared Error:", 

meanSquaredError) 

print("Root Mean Squared Error:", 

rootMeanSquaredError) 

print 

("Scoring:", regressor.score(X_tes 
t, y_test)) 

pit.plot(y_predictions,y_test,'r. 
') 

pit.plot(y_predictions,y_predicti 


I 



pit.title( 1 Parity Plot - Linear 
Regression') 
pit.show() 

plot = pit. scatter(y_predictions, 
(y_predictions - y_test), c='b') 
pit.hlines(y=0, xmin= 100000, 
xmax=400000) 

pit.title( 1 Residual Plot - Linear 
Regression') 
pit.show() 

joblib.dump(regressor, 

' ./salepricemodel.pkl 1 ) 


rfijepid&rTasr QeueifUrffil): 





Index(['OverallQual', 

'TotalBsmtSF 1 , 'lstFlrSF', 
'GrLivArea', 'FullBath', 
'TotRmsAbvGrd', 'Fireplaces', 

'GarageCars', 'GarageArea', 

'Years Before Sale', 'Years Since 
Remod', 'SalePrice'], 
dtype='object' ) 

Number of predictions: 365 

Mean Squared Error: 

981297922.7884247 

Root Mean Squared Error: 

31325.675136993053 

Scoring: 0.818899237738355 




I^FigureT 


* Q S El 


Parity Plot - Linear Regression 



x=106757 y=213518 










rfilireoid&mssr aSlerrdsni>: 










1. training_data stsvild GarrudHro^err 

s-sbm ^us^sh^ld df-d(§m 

Q & IJJ/jS <5 U U lL (J) Sl5) l1 L_ (53T. 

2. a6vsfluu£fD(8) a_<ssiy ld ^s^ism^^iLD X- 

^/ld, &>(5$tifld&>Lju l _ Gsusmi^iu 'SalePrice' 

<oTg&tu&)] y-jp ld (H&Lfildau u L-Qmmgj. 
^)<95/D0 (tpsarsar/jpOpO GTmu&J 

asvsfldauuL- Gsusmi^uj column-®p df- 

sSlQj)r5g)l r£dd) LEsmQiD «<st>z_£) 

column- ^Gtnsmddl/Dg)!. ^DjSsot 

(J LpSVLD [ :-l] toTmd G)3irTQ\^§] &>(5ft)l _ <Pld(3j 

(Lpmmrreb ssbsrr X-ay ill, 

<*<sd/_^) column-^sw 'SalePrice'-®»y- 

ayi ld G&Lft^&jd Qsiirmmsvmb. 

3. fit() Grmu&j arosgid QarT®uu£5iD(§LD, 
predict() aisiiuyi 
a6vsfluu£iD(8)Li) uiustruQdlfDgi. 



4. SCore() ’srmugj rBLD&j 
algorithm CTsiieuevrsiy §urjLD &fihurrad 
aingudQamsmQmmgi 6 T 63 r/_/ 63 i<s 
LD{dlUL 3 l—U UlU 6 ZTU®d)lD 3 )l. 

5. train_test_split() eisiiui^ [eld(lp®s)i—iu 

( 0 ) 70/75 <5 <5 <s/Tsiy<5i5<sT>syr 75% - 25% otct/lo 
si5)®<s<5^)su LSlflddljo^i. 75% 

<5/76i/«6yr<s/D/p/<® QarT®uu£iD( 8 )Lb, 

25% <5/761/<S6)T <3<377<5631637 QgUJ&J 
ld^IulSI^isu^jd^ld uiusirut^LD. 

6. mean_squared_error, sqrt ^Sliu 
functions, /bld^/ algorithm-^ si) 

a 6 vsfldauu®Lb LD^IuLiai^d^LD 
ssmss)LDUjrrm ld £5) lil/<*(613 <950 to asnsn 
<^Lf>ui 3 m sijn&tfismud asmi—jfilrBgi 

£|)/ 7 i <5 ^jLpuLi &>rrG$T 'Residual 
Error' ^(goo. sp(§ susdijui—ld^ 

6U631/7/565/ <57LLUUL ®mmgj. 



7. joblib toTffsrugi rsLDffj model-®p .pkl 
GtanruurTa G>&L£ld(§Lc>. ^)^/(Ssu pickle 
file ^@1 ld. serialization LDro^iid 
de-serialization -<*(3 s-gsy&imrD spq^ 
binary GtarTUL/ snsnai ^(§lc>. £D<s3i<s 

S$)GU{£3)] loTSlJSUrTTU LjSjlU &,[JGy&S$)6fT 

as^sfluu^i term u(§$dlu51<sv 

arTGmsvmL. 


11.2 Prediction 

rBLDgj GtarTULUhsv ssbsrr (Lp^sv ldlI^lci 

Q <95/7® <sng>rf)3inm s^ssxsosmu aststfldad 
Q^rrsv&jiGeurrLD. input.json srmnb 
Qaimji3sii sul^KSuj Q<s/7®«35ULJ@ffl/Dai/. 


cat input.json 



{ 

"OverallQual":[7], 
"TotalBsmtSF":[856], 
"IstFirSF":[856], 

"GrLivArea":[1710], 
"FullBath":[2], 
"TotRmsAbvGrd":[8], 
"Fireplaces":[0], 
"GarageCars":[2], 
"GarageArea":[548], 
"Years Before Sale":[5], 
"Years Since Remod":[5] 

> 


predictQ Q^iusu^roainm rffrjsv i3msu(f^LDrrQ]. 


https://gist.github.com/ 

nithvadurai87/4a31b465220448ab05b84d2227 

e4e8a5 







import os 
import json 
import pandas as pd 
import numpy 

from sklearn.externals import 
joblib 

s = pd.read_j son('./input.json') 

P = 

joblib.load("./salepricemodel.pkl 
") 

r = p.predict(s) 


print (str(r)) 










[213357.65598157] 


ssssTss)LDUjrTm SalePrice ld^lll/ 208500 srsaftsu 
rBLDgi filrjisv 213357 stsotjld LD^iui3s<s)m 
Qsus/rluu® <50/ ld. £|)0/ ®l_(_<50l_!_/_ 
urisunuSlsbss)sc. ejQmssflsb /bld 0 / algorithm -sot 
score, 81% ^ 0 ld. srmQsn £|)00 ^gys/rsiy 

Sl 5 )<5£5)UJfWLD 0 < 5 < 95 < 5 < 5 fTSOT Q&lLlIL/LD. 

rglijeyid&mssT aSlendaih: 

1 . joblib.load() stsotl/ 0 / binary sui^siSlsb 
a_syrsyr (S<5fr/jL5)sOTSOTde-seriabze Q&ujgj 
algorithm-^* ld nr6[0l ( 3 &Lfild(§Lb. 

2. i3mmij ^)^<sot l/? 0 / Q&ilksvuQld 
predict() c ^,sOT 0 /json sl/lj.si5)su ssbsn 
&,rjGi)g,G$)<srr 2-<sb<sffi— nad (0)<5/7®< 50 / 




tSy&.rD&nm Qsu<s^luSLiu\.ss)md 
asttsfldSl/Dgl-. 

£g)/j>a> prediction-<*<g5f7®sra_syrsyf(5) LDjoguLb 
Gsus/tIuS'lI^) LD^iui3ss)m loTsijsunrgu sp(§ Rest 
API-^a expose Q^iusu^] sissiiq] unrfdasvrTLD. 


11.3 Flask API 

r 5 ld&,i algorithm asvsfldigLD i£>$ui3<5$)m spqjj 
API-^a expose Qi^iusu^jd^ Flask 
uiUGzruQdljDgl- ^pjD&rTGsr rflrjsv i3ssisu(j^LDn^]. 


https://gist.github.com/ 

nithyadurai87/9d04097e006e2fe6c7a96blda64 

3cb3a 





import os 

import json 

import pandas as pd 

import numpy 

from flask import Flask, 

render_template, request, jsonify 

from pandas.io.json import 

j son_normalize 

from sklearn.externals import 
joblib 

app = Flask(_name_) 

port = int(os.getenv( 1 PORT', 

5500)) 

@app.route('/' ) 
def home(): 
return 

render_template('index.html') 




@app.route('/api/salepricemodel 1 , 
methods=[ 1 POST']) 
def salepricemodel(): 

if request.method == 'POST': 
try: 

post_data = 
request.get_json() 

json_data = 
j son.dumps(post_data) 
s = 

pd.read_json(json_data) 

P = 

joblib.load("./salepricemodel.pkl 
") 

r = p.predict(s) 
return str(r) 

except Exception as e: 
return (e) 


if _name 


main. 




app.run(host='0.0.0.0', 
port=port, debug=True) 


rflijepidamzsr Qsu&flidCh}: 

* Serving Flask app "flask_api" (lazy loading) 

* Environment: production 
WARNING: Do not use the development 
server in a production environment. 

Use a production WSGI server instead. 

* Debug mode: on 

* Restarting with stat 

* Debugger is active! 

* Debugger PIN: 690-746-333 

* Running on http://0.0.0.0:5500/ (Press 
CTRL+C to quit) 





£g) <5<531<53T postman STgl/LD <95051 5) ^LfiSVLi /5/7LD 
(S<jf7^)<5^/<95 Q^nsirsiisondj. 


POST 

» http://localhost:5500/api/salepricemodel 

• form-data • x-www-form-urlencoded • raw • binary 

1 - < 


"OverallQual" :[7], 


"TotalBsmtSF": [856], 

4 

"IstFIrSF" :[856], 

5 

"GrLivArea" :[1710], 

6 

"FullBath" :[2], 

7 

"TotRmsAbvGrd" :[8], 

8 

"Fireplaces" :[0], 

9 

"QarageCars" :[2], 

10 

"QarageArea" :[548], 

11 

"Years Before Sale": [5], 

12 

13 > 

"Years Since Remod":[5] 

Body .. 

« Headers (4) Test Results 

Pretty 

Raw Preview HTML ▼ rp 

i 1 [213357.65598157] 




11.4 Model comparison 




/ 5 ld^i model a_0suf7<g5<95<s^)/D0 QsuguLD linear 
regression-^/ ldlLQld uiumu^i^fTLcsb, (Ssugu 
fflsv algorithm-siy/_gyLD sra// 

&)fD[5&(3&rr cgi/saijS uiusiruQgg (Ssusmt^LD. 
^pjDamsvr rflpsv (_5)&sr6U0LDf77p/. /bld^/ 
&,[jGi]as$)6n usvCSsug^j algorithm- si) Clurrq^^^l, 
<si>sijQsnrTmr/)lmis<ni—UJ Score ld/d/p/ld RMSE 
LD^iuL/asaisyr QsusufluuQl^^iSljo^l. ^Gurorfileb 
§\rDr5&G$)&, /B/TLD (3<$(7si/ Q&ujgi QanmQisvmh. 

https://gist.github.com/ 

nithvadurai87/9ecfcbf04593d245e26316d52b0 

708el 


import pandas as pd 








from sklearn.linear_model import 
LinearRegression, Ridge, Lasso, 
ElasticNet 

from sklearn.ensemble import 
RandomForestRegressor, 

AdaBoostRegressor, 

ExtraT reesRegressor, 

GradientBoostingRegressor 

from sklearn.tree import 

DecisionTreeRegressor 

from sklearn.neural_network 

import MLPRegressor 

from sklearn.model_selection 

import 

train_test_split,cross_val_score 
from sklearn.externals import 
joblib 

from sklearn.metrics import 

mean_squared_error 

from azure.storage.blob import 

BlockBlobService 




import matplotlib.pyplot as pit 
from math import sqrt 
import numpy as np 
import os 

df = 

pd.read_csv('./training_data.csv' 

) 

i = list(df.columns.values) 
i.pop(i.index('SalePrice')) 
df0 = df[i+['SalePrice']] 
df = 

df0.select_dtypes(include=['integ 
er','float']) 

X = df[list(df.columns)[:-1]] 
y = df['SalePrice'] 

X_train, X_test, y_train, y_test 
= train_test_split(X, y) 




def linear(): 

regressor = 

LinearRegression() 

regressor.fit(X_train, 
y_train) 

y_predictions = 
regressor.predict(X_test) 
return 

(regressor.score(X_test, 

y_test),sqrt(mean_squared_error(y 

_test, y_predictions))) 

def ridge(): 

regressor = Ridge(alpha=.3, 
normalize=True) 

regressor.fit(X_train, 
y_train) 

y_predictions = 
regressor.predict(X_test) 
return 

(regressor.score(X_test, 




y_test),sqrt(mean_squared_error(y 
_test, y_predictions))) 

def lasso(): 

regressor = 

Lasso(alpha=0.00009, 
normalize=True) 

regressor.fit(X_train, 
y_train) 

y_predictions = 
regressor, predict (X__test) 
return 

(regressor.score(X_test, 

y_test),sqrt(mean_squared_error(y 

_test, y_predictions))) 

def elasticnet(): 
regressor = 

ElasticNet(alpha=l, ll_ratio=0.5, 



regressor.fit(X_train, 
y_train) 

y_predictions = 
regressor.predict(X_test) 
return 

(regressor.score(X_test, 

y_test),sqrt(mean_squared_error(y 

_test, y_predictions))) 

def randomforest(): 
regressor = 

RandomForestRegressor(n_estimator 
s=15,min_samples_split=15,criteri 
on='mse',max_depth=None) 
regressor.fit(X_train, 
y_train) 

y_predictions = 
regressor.predict(X_test) 

print("Selected Features for 
RamdomForest",regressor.feature_i 
mportances_) 




return 

(regressor.score(X_test, 

y_test),sqrt(mean_squared_error(y 

_test, y__predictions))) 

def perceptron(): 
regressor = 

MLPRegressor(hidden_layer_sizes=( 
5000,), activation='relu', 
solver='adam', max_iter=1000) 
regressor.fit(X_train, 
y_train) 

y_predictions = 
regressor.predict(X_test) 
print("Co-efficients of 
Perceptron",regressor.coefs_) 
return 

(regressor.score(X_test, 

y_test),sqrt(mean_squared_error(y 

_test, y_predictions))) 




def decisiontree(): 
regressor = 

DecisionTreeRegressor(min_samples 
_split=30,max_depth=None) 
regressor.fit(X_train, 
y_train) 

y_predictions = 
regressor.predict(X_test) 

print("Selected Features for 
DecisionTrees",regressor.feature_ 
importances_) 
return 

(regressor.score(X_test, 

y_test),sqrt(mean_squared_error(y 

_test, y_predictions))) 

def adaboost(): 
regressor = 

AdaBoostRegressor(random_state=8, 
loss='exponential 1 ).fit(X_train, 



regressor.fit(X_train, 
y_train) 

y_predictions = 
regressor.predict(X_test) 

print("Selected Features for 
Adaboost",regressor.feature_impor 
tances_) 
return 

(regressor.score(X_test, 

y_test),sqrt(mean_squared_error(y 

_test, y_predictions))) 

def extratrees(): 
regressor = 

ExtraT reesRegressor(n_estimators= 
50).fit(X_train, y_train) 
regressor.fit(X_train, 
y_train) 

y_predictions = 



print("Selected Features for 
Extratrees",regressor.feature_imp 
ortances_) 
return 

(regressor.score(X_test, 

y_test),sqrt(mean_squared_error(y 

_test, y_predictions))) 

def gradientboosting(): 
regressor = 

GradientBoostingRegressor(loss='1 
s',n_estimators=500, 
min_samples_split=15).fit(X_train 
, y_train) 

regressor.fit(X_train, 
y_train) 

y_predictions = 
regressor.predict(X_test) 

print("Selected Features for 
Gradientboosting",regressor.featu 
re_importances_) 




return 

(regressor.score(X_test, 

y_test),sqrt(mean_squared_error(y 

_test, y_predictions))) 

print ("Score, RMSE values") 
print ("Linear = ",linear()) 
print ("Ridge = ",ridge()) 
print ("Lasso = ",lasso()) 
print ("ElasticNet = 

",elasticnet()) 
print ("RandomForest = 

",randomforest()) 
print ("Perceptron = 

",perceptron()) 
print ("DecisionTree = 

",decisiontree()) 

print ("AdaBoost = ",adaboost()) 

print ("ExtraTrees = 



print ("GradientBoosting = 
", gradientboosting()) 


t§ijepi&&n<ssr QsuefiUdQ.: 


Score, RMSE values 

Linear = (0.7437086925668539, 

40067.32048747698) 

Ridge = (0.7426559924644496, 
40149.523137601194) 





Lasso = (0.7437086997392647, 
40067.31992682729) 

ElasticNet = (0.7427716507607811, 
40140.499909601196) 

RandomForest = (0.7816174352942802, 
36985.57224959144) 

Perceptron = (0.7090884723574984, 
42687.80529374248) 

DecisionTree = (0.7205230305007451, 
41840.45264436496) 

AdaBoost = (0.7405881117926998, 
40310.51057481991) 

ExtraTrees = (0.8112271823246542, 
34386.90514804029) 

GradientBoosting = (0.770865727419495, 
37885.095662535474) 

Selected Features for RamdomForest 
[0.61070268 0.04279095 0.04336447 



0.17066371 0.01107406 0.01329107 
0.0065515 0.03938371 0.02458596 
0.02051551 0.01707638] 


Selected Features for DecisionTrees 
[0.75618387 0.03596786 0.02304119 
0.13037245 0.0022674 0. 0.00739768 
0.01056845 0.01184136 0.01171254 
0.01064719] 


Selected Features for Adaboost [0.38413232 
0.18988447 0.03844386 0.12826885 
0.03857277 0.03995005 
0.01059839 0.08066205 0.05036717 
0.01473333 0.02438674] 


Selected Features for Extratrees [0.33168574 
0.04675749 0.05913052 0.11159271 
0.05178125 0.02947481 



0.03966461 0.16786223 0.06241882 
0.05316226 0.04646956] 


Selected Features for Gradientboosting 
[0.04426232 0.16359645 0.14768597 
0.25403034 0.02119119 0.04361512 
0.01825781 0.01626673 0.15891844 
0.07188963 0.06028599] 


Co-efficients of Perceptron 
[array([[ 2.83519650e-01, 7.33024272e-03, 
2.80373628e-01, ..., -1.43939606e-03, - 
3.84913926e-02], 

[ 1.34495184e-01, 1.31687141e-02, 
1.72078666e-04, ...,1.70666499e-23, - 
2.31494718e-02, -1.08758545e-02], 

[ 9.44490485e-02, -2.34835375e-02, 
2.37798999e-02, ..., -1.74549692e-02, - 
2.70192753e-02, -3.67706290e-02], 



[ 1.59527225e-01, -3.19744701e-02, - 
1.22884400e-01, -2.35994429e-26, - 

3.03880584e-02, -2.85251050e-02], 
[-3.63149939e-01, -4.05674884e-02, 
2.66679331e-01, -1.73628910e-02, 

7.40224353e-03, -6.89871249e-03], 
[-4.30743882e-01, 7.07948777e-03, 
3.34518179e-01, -1.74075111e-02, 

3.47755293e-02, -2.64627071e-02]]), 

array([[ 0.16789784],[-0.01864141], 

[ 0.20432696],...,[ 0.01739125],[-0.02779454], 
[-0.00476935]])] 


model-g> dltorB^sgi <ormd 

<SL/p/ 61 V< 5 /D 0 , l&GVUGtni—UJ Score ldjdituld RMSE 
ld^iuli vsvrrgj Threshold Limit, Sensitivity 
G>urT<5$Tir)GUfD®fifDiLiLD rBmb asmddhsv Qamsrrm 
(Ssusmt^LD. ujDj^hLjLb (SlcCSsv 



gjflj? ui3ilQ\6b<srT spshQsntTq^ algorithm-®ozj 
u/DjfihLiLD iSIssrssrij 15 ml tsfilGrrdaLimad anrsssrsoml. 
QldQsv (ff)i&iji3id(^)sberr algorithms -sv 
GpQjjdhsvsmsu miiQ 3 , 153 , features-^ ensu^a^ 
&6$sfl£§]sb<5n&)] GTsiiustftg, Qsij6i < rluu®£$dnLi<snGfr9 > i. 

^mush ^jib^u usrnL/ linear, ridge, lasso, 
elasticnet (Suasm/Dsu/Dj^jD^d dlstni—Luira,]. 
s^aGsu nssTfo algorithms-* @ RFE 

technique Qpsvil 3 ,ml, features-®** <3<*/j6iy Qauj^i 
tSysvuuu (Ssusm^il. ^smaml ujdffil 'feature 
selection' srgn/ti. ^($£3, u^^iuSlsb amsssrevmi. 

11.5 Improving Model 
score 

15 ml 3-(i^6u9ddhu model-swscore-^sw^/ 
u5)*6iyi£> 0<531/DSUfr« £§)0*ffl/Dj!/ <oT 6 St)sb, <SH 9 )] ( 0 TI 5 &) 
^)/_<5^)su <s>Indian! (SGuruuQdlrD&il <6rsmd 


< 95 <s 3 sr/_rt 5 )uj trend / parity Qumsirp 

Gus$)[jui—rm3>Gis)<srrL] (HurnlQu unrrfda 

(Ssusm(^)Lb. SdLpdasmi— a-prrrjsmg&ilGV sp(§ 
aSi-Lu^m eiSI sot <sv sot iu ^fjsmuSluu^roairTm 

usvCSeugu <sml>&rhiatgnjLL, 

^pGmsj-uuGtni—uSlisv rglrjsmuSldauuLLL- 
si5)/dl/sotsot ®S)G$)G03it§fT > Lb uuSljodld^d 

QanQdanjuLLQsbGrrm. sots rsmb 

a_ 0 sn rrddhu model- sot score ^sot^/ 35 stsot 
Gur5&>l<srr6iT§l. stsotGsu st/5<5 ^)(_<s^)su 
ssmss)LDUjnm siSsotsui^ld, asvsfld&uuQLb 

Sl5)sOT6Uip LD cgl/Si)<SLD (S<SUglJU QdljBgil SJSSfd 

«s 3 OT/_rt 5 )uj trend, parity 
plots snss)rjiuuuLLQ\<sbsnm. 


Ulijeu LDipipiLD ^iajeir Oguafiafift); 


https: //gist, github. com/nithvadurai8 7/ 

ca54a4a8f59187cb988b5145d000c70c 


import pandas as pd 

from sklearn.linear_model import 

LinearRegression 

from sklearn.model_selection 

import 

train_test_split,cross_val_score 
from sklearn.externals import 
joblib 

from sklearn.metrics import 

mean_squared_error 

import matplotlib.pyplot as pit 

from math import sqrt 

import os 

df = 

pd.read_csv('./training_data.csv' 

) 





X = df[list(df.columns)[:-1]] 
y = df['SalePrice'] 

X_train, X_test, y_train, y_test 
= train_test_split(X, y) 
regressor = LinearRegression() 
regressor.fit(X_train, y_train) 

y_predictions = 
regressor.predict(X_test) 

meanSquaredError=mean_squared_err 
or(y_test, y_predictions) 
rootMeanSquaredError = 
sqrt(meanSquaredError) 

print("Number of 

predictions:len(y_predictions)) 
print("Mean Squared Error:", 



print("Root Mean Squared Error:", 

rootMeanSquaredError) 

print 

("Scoring:", regressor.score(X_tes 
t, y_test)) 

## TREND PLOT 

y_test25 = y_test[:35] 

y_predictions25 = 

y_predictions[:35] 

myrange = [i for i in 

range(l,36)] 

fig = plt.figure() 

ax = fig.add_subplot(111) 

ax.grid() 

pit.plot(myrange,y_test25, 
marker='o') 

pit.plot(myrange,y_predictions25, 
marker='o') 

pit.title('Trend between Actual 
and Predicted - 35 samples') 




ax.set_xlabel("No. of Data 
Points") 

ax.set_ylabel("Values- 

SalePrice") 

pit.legend(['Actual 

points','Predicted values']) 

pit.savefig('TrendActualvsPredict 

ed.png',dpi=100) 

pit.show() 


## PARITY PLOT 

y_testp = y_test[:]+50000 

y_testm = y_test[:]-50000 

fig = pit.figure() 

ax = fig.add_subplot(111) 

ax.grid() 

pit.plot(y_test,y_predictions,'r. 
') 

pit.plot(y_test,y_test,'k-',color 
= 'green' ) 




pit.plot(y_test,y_testp,color = 
'blue' ) 

pit.plot(y_test,y_testm,color = 
'blue') 

pit.title( 1 Parity Plot 1 ) 

ax.set_xlabel("Actual Values") 

ax.set_ylabel("Predicted Values") 

pit.legend(['Actual vs Predicted 

points'Actual value 

line','Threshold of 50000']) 

pit.show() 

## Data Distribution 

fig = pit.figure() 

pit.plot([i for i in 

range(l,1461)],y,'r. ') 

pit.title('Data Distribution') 

pit.show() 

a, b = 0 , 0 

for i in range(0,1460): 




if(y[i]>250000): 

a += 1 
else: 

b +=1 

print(a, b) 

#X = X[:600] 

#y = y[:600] 


Trend plot sjmu§] s-smemmunm s^ss)Soan^LD 

model-<S<SOTf?<5<5 S!S]sS)SC3>i^LD 6775;S <SySfTGlj&(& ) 

£fl£$5lujrT&uu($)&>)6iTfDGxr <ormuG$)&>& 






« <■ 4 

* Q £ ^ 

B 


Trend between Actual and Predicted - 35 samples 



Parity plot <nmu§] ■SHffis, oSl^Siku rr&£ G?iip 0 spqjj 
threshold-g> ^sturidSl/Dgi. ^£f,rTSij§i siSsaisu 

Gsu gU uni— 50 ^uSlijLb kuoti/t (LpsirmiLb 
iSisii&piLD Q&sbGVGorTLD srmd QanQpgj <snr5&> 
threshold-<® 0 srr OTsi/susyrsiy si5]<sjn<su<*srr 








^6ts)LDr53)i6b<5frm, ■SHfSrd g, (Sldsv srsi/su array 
^ss)LD^§]sbsn§] gisZiugs)&,& amlQ&ljDgl. 









data distribution chart 

GUGtnrjiLnjuLLQsbm&ii. ^)<s<53rX- 

<s>iddlsv uu51jDff)d(3) ^s/rldauuLlQmGrr 1460 
rows-LD, Y-^ddUsv GiflrDUGmm aS]ss)soaa^Lb 

GUGtnrjui—LDrra, su&nurB^i &rTLLi—uuid($\<sb<snm. 

(tp<5<si) 600 records- su&nij aS)ri)uss)m 
G&G&Goansn 1 su/J.<9 : ; 5^)a5)0fl5^/ 5 s vlL&ld snss)fj 
urjGusorraiij urjaS]iLj<sb<sn^ss)&,d ainsmsomb. 

(Sldsv 6OO-a5)0/B6y 1000 records- eusnrj 

G&rDUGmm ®S)G5)G03i<sb^Gtnm^&rfLD QsuguLD 2 
svLl^^lQevQiu ^^laab ^s<s)LCi^^iq^ijuss)ff,d 

arrsmsvrTLb. model-sar 0®s iprs^ 

SCore-<*0<® arrrjsmLb. uuSl/odl ^enfld&>uu ($\ld 
f)ijsija,sfirissiif l i dpnm (LpemrouSlsb urjeusonai 
= S7<SDLD/B^)0<®<5I5 (Ssusmt^LD STSSt s7/rj<*<s3r(Ssu 

asmCHi—mb. £g)/5>0 ^sijsungu ^jsvs^isv. srmQsn 
670/ su s?r> rj drrrra, u u rj si5) iq sfrsrrQ^ n issues) j 

ldl2®ld ssbsrr <s/rsiy<*®r>s)T<® QarrQggj model-g> 
a_0suf7<®0LD<3u/T^/ cgi/<5 sot score ^^ianfluuss)&,d 



<s rrsmsvrrLD. X = df[list(df.columns)[:-l]] , y = 
df['SalePrice'] srmd X = 

X[:600] , y = y[:600] stsmild Guflastnm 
<s3i<sw<5< 5 /7<si) (SungiLDrrsvTgi. (Lp&>eb 600 records 

<ol/<SS)/7 LDlLQld S-Sbm £,[JGlj3iGiS)<Sn 6T($\£a)l [BLDgl 

model a_0<SU/T<*<*LyLJ®7D. 




Output: 

Number of predictions: 365 


Mean Squared Error: 2312162517.277571 








Root Mean Squared Error: 
48084.95104788578 


Scoring: 0.34729555622354125 
97 1363 


asmi—HdiurTa, uuSl/odld^d QarrQd&uuLLQisbisn 
1460 <5/rsiy<*syf)®L) 25OOOO-<s0ld (Sldsv srsi/susyrsiy 

LD$dlULi&m ssbsrrm, ^^jd(^d Srp srsi/susyrsiy 
LD^lULiam s-sbmm toTsvrugj 

asmQLSliii-dauuLlQmm&tj. $}d>l<zv 1363 
LD^iuLisush 25OOOO-<®0 S(1£ld, Qsutuld 97 

LD^IULisush ^<95/00 (Sld^/ld ^G&LDrsffrfsbGrrm. 
CT<53r(Ssi7 ^)^/SiyLD £f/77T<95 ^GVSVKSV. 61V 

outliers <sTmuuQ\Stp)§]. ^i^Qunmro outliers-®o 

toTGueurrgu ^d^sngj tsrsxr 

u@§)lu51<sv arTGmsvnLb. 




12. Feature Selection 


Qainiji3^]sb usvQsugu columns 
^(r^dSlpQpssflsb, ^suppjsrr GT[sQ&>r 5 &> column 
LDfiduLi&GSxsmj Qurrpjpgi pmi aGtssflddlsirp 
SiHsi^lUli ^SS)LcS\p§l SJSSld &(5fi5T(J))iSlUj-LJU(33> 

feature selection ^, 0 ii . a-prrrjsssrp§]d(^ 400, 
500 columns-g>« QamsmQmGrr Qaimji3e6lq^^§i, 
prediction-<s 0 s-ps^th spQ^dlsv (Lpddhu 
columns-gjai Gprjsi^ Qatu su§] feature selection 

^tgii. £|): 5 /D 0 (Lpps61<sv r5Li>Lfili—(LpmGrr 

columns-g> process variables, manipulated 
variables & disturbance variables srspro 3 

GUGtn&uSlm dLp iSltflda (3susm(^)LD. $l$<sv 



manipulated ld/dt^ld disturbance 
input-<®<sf7<s3r parameter-^asiy ld, process 
laTsiru^i output-<*<g5f7<5OT parameter-^asiy ld 
cgl/SDLDffl/D^/. 

• Manipulated Variables (MV) 

- ^)si/su®r>aiu5)<sOT Srp ^s^ildilild columns- 

6 v ssbsrr ld^)lvl/< 95saisyr rsLDLDmsv LDrrrprfil 

<s>1<55)LD&3i (LpiIj-lLj LD. ^LD<*(S<95/D/DSU rTgU 

£D<S&n&sr fBmb stnaiumsmsvrrLb. 

• Disturbance Variables (DV) - 

rSLDLDmsv (Urs/nq-iurra LDrirotSl ^smh&ai 

(Lpuj-iungj. ^mrrsb manipulated- sir 

LD^iui3ss)mij Qurrv)]£(J>£ ldSJu l/ 

• Process Variales (PV) - usvQsut^j 

Q&iu<sv(Lp6tn/Da6wsnu Qurrgn^^i 

G§)&l6Vimm LDSfiuLi&m <s>]6mDiLiLb. 

^ibrisui^j LD/rj/o columns LD^iuL^ayssxsnu 



Qurrgu^CS^ $}&sxr ld^)ul/ ^Gmudlingi. 
<oTssrQsu @)@se rsmb LD/r/D/p/siJjS/Dgj srgjLb 
Slstni—iungi. 

(SLDjD^smisurrgu LSlrflpp i3m spshQsurrq^ 
variable-<* 0 ld ld/d/d variables-siy /_sir ^q^dq^Lb 
Q < 5 rri—ijiSl <S31<S3T<95 asmddhsv (HsusmQLb. ^)^/Gsv 
correlation ct<swl}u®ld. $}&sir ld^iuli -1 
«aS)®/5g/ +1 SUSS)[J ^STlLDlLILD. -1 tormuff)! 
toT$ZlrjLD6tniD£ Qq,ni_ijss)LJiLiLD, +1 Qrr,ijLLss)fodj 
(olpm—ij&nuiLiLD q^rfildq^LD. 

a_<S/T/ 7 <sw<s ^/<950 "asm smi LD s-smaS}sir ^srrsq ", 

" s-i—jbuuSlrb® Ql&ujnj ld Grsrji-b'', "ct<st>/_ 

0<S31/Dfjf-/ L/<5<5<95/sj<95(531 OTU UlSf-d(§LD (S/S/Rd" 

(SunrsirjD dUsv use features-g> <s3isv<s^/, "s-tsSlm 
srsnu" sisrpiLp spqjj 6i5)s2$/UJ< 5 <S 31<5 /s/rd asvsfldau 
(Sunsu^rrad Qfhnswiunsb ^groanem correlation 



matrix si) asbm LD§ilLjLiam i3mGU(rjjLDng y 

fSnLDlLjLD. 

• Positive Correlation: 2_z_aS)<s3r 
CT®r>/_<* 0 LD - ssm smi LD ssms61sir 
^snsijd^LDrrm (o)< 5 /n_/rL/ +1 srm 
Qsus/rluu(^LD. 3-GmaS)m ^svrsiy 

^Stlafl^msv ct < 5 d /_ ^dJafld^LD. 

• Negative Correlation: 

<5T6tni—d(§LD - S-l—JDUuSljod} Q&ILULILD 
G>r5!j£5)l!D(8)LDm5tsr (o)< 5 fn_/n_/ -1 srm 

QGUGlflUU ® LD. 2_z_/dl/u5)/d£) Q&ilulild 
(SrsuLD s-isSlm ct < st >/_ 

(gstnjDiLjLb. 


Zero Correlation: 6 tsdl (^ss>jdui-i 

u/Dffihu n£&>3ifm3,G$)mij ui^d^Lb 

(Srsu^^iisir QamsmQmm (o)<sfn_/ 7 z_/ 0 



<oTGLST QsuSufluUt^LC). UlSj.d(^LC> 

(SfB[T^^ljDd(^Lb 3-isSlm 6JG$)i—d(&)Lb 
lurrQprrQjj &LDLC>r5&(LpLi) {g)sassist) 

• £g)sal smashsorr&> (Ssagu @1 sc features 
<g£l(i)iji3m ^smsu QanrGmQsrrsrr 
Q&,ru—!ji3s$)Gtsru Quirgupgi, <s>i&tDarrm 

LD^IULI -1 aS)0/5^/ 1 SUSS)[J ^SS)LDlLjLD. 


12.1 Highly Correlated 
features (MV - DV) 

dipdasmi— 2 _<s/T/ 7 <sw<s^)<si), data.CSV ststiild 
Ga>/TL//J!/7)0syV ssfretr columns-®!) srgi S7g/ 
simQmmm sussiaujfrssrparameters sismiLL, 
si5]sv/7<s<53i<s rsml) domain expert-sir 2-&,sSl 
Qarnsm® Q^flrs^i 

QairrensnsciLD. s-^nrjsm^^id^ A (tp<s® l> Z sussirj 
QLuurrasrr Qarrsmn— 26 features-su A,B>C,D,E,F 


^,®uj<sd su process parameters ^asiyro, 
LDjDjnGnGu manipulated ldidtuld disturbance 
parameters ^ssiyLD a>0_@ity shGerr mi. stsstGsu 
(L ptz<s 61 <sb process parameters ^ssxsst^^ild 
dataframe-si5)0^ fidauuQdlsftrjDGftr. i 3 mssrij 
LS^iiLishm manipulated LD/rj/p/Lc disturbance 
p a ra m e te rs-« * f7®j7 correlation 

iS)isj-d&uu lL®, tSHgi (HanruLi si/aj.siS)^/LD, 

<oU (oft) [JLJ L _ 61/UJ_6l5)6J>/LD 

Qsus/rlLju(^^LjuLl(^mm^i. ^Gu/Djflsv 
^gi/svrsiy Grr,rjLDSft>fo ldjdiq] ld <oi§ii[jujsft>fod> Glpni—rji-i 
Glarressr® dransftsij dataframe-a5)0^^/ 
fidauuQ&lsftT/DGftr. tsy&rr gu&j - 98 ,- 99 ,- 
1 , 98 , 99,1 <oT(oft)i ld Qff,rri—iji3ss)mij QurorSlq^d^ti 
£g)0 features-si) epsiinjj 
fid&uuQ&irD&j. <g£i<sijGi]n[DrT&> manipulated 
LDjDigjLD disturbance-<*®<sn/_u 5 )fflL> ^S^ad, 
(o)<5fn_/jfz_/ QansmQmm ^Lb&rhiam 
3 iGS 5 T($\l 3 iq-d 3 ,UULL($\ ^SUJOJl^SV SpGftTgU 



fid&uuQdliDgi. LSff,(Lp<sbsrr ^s<s)m^§]Lb 
training_data sissy ld Quiuflsv 
Gt&LSldauuQdljDgi. £§)a;/(3 su rsLDgj process 
variable-<® 0 LD, (S^fj^Q^QidanjuLiL- 
manipulated & disturbance variabl e-d^umm 

Q&,ni—iji3G$)md asmi—rfilsupiD^ a_srrsyfi_f7<95 
^s^ildSIjd^I- ^GVGU&sb <s>is$)m{£§]Lb [5mb 
&®ssfld& Qsusmi^uj process variable-siyz_<sw 
Q&msssrQsbsfT Gl^rri—iji^ssxssrd 3 ,ssst(^\i3u\.^§i, 
)<sv 0 G\&>m _/7L/ Quri)Qjsbsn columns-ga 

/f <®0 su§] uLq-iurra ^s^iLddlivgi. 


https://gist.github.com/ 

nithvadurai87/5a43155d33cf5288204def23661 

704d0 





import pandas as pd 

import matplotlib.pyplot as pit 

import numpy 

from sklearn.linear_model import 
LinearRegression 
from sklearn.model_selection 
import 

train_test_split,cross_val_score 

from sklearn.metrics import 

mean_squared_error 

from math import sqrt 

from sklearn.feature_selection 

import RFE 

from sklearn.datasets import 
make_friedmanl 

df = pd.read_csv(/data.csv') 



df = df.drop(["A","B", "C", "D", 
"E", "F"], axis=l) 

#finding correlation between 
manipulated & disturbance 
variables 

correlations = df.corr() 
correlations = 
correlations.round(2) 
correlations.to_csv('MV_DV_correl 
ation.csv',index=False) 
fig = plt.figure() 
g = fig.add_subplot(111) 
cax = g.matshow(correlations, 
vmin=-l, vmax=l) 
fig.colorbar(cax) 
ticks = numpy.arange(0,20,1) 
g.set_xticks(ticks) 
g.set_yticks(ticks) 
g.set_xticklabels(list(df.columns 
)) 




g.set_yticklabels(list(df.columns 

)) 

pit.savefig('MV_DV_correlation.pn 

g') 

#removing parameters with high 

correlation 

upper = 

correlations.where(numpy.triu(num 
py.ones(correlations.shape), 
k=l).astype(numpy.bool)) 
cols_to_drop = [] 
for i in upper.columns: 

if (any(upper[i] == -1) or 
any(upper[i] == -0.98) or 
any(upper[i] == -0.99) or 
any(upper[i] == 0.98) or 
any(upper[i] == 0.99) or 
any(upper[i] == 1)): 

- cols_to_drop.append(i) 





(20, 17) Index(['G', 'H', T, 'K', 'M', 'N', 'P', 'Q', 
'R', 'S', T, 'U', 'V', 'W', 'X', 'Y', 'Z'], 


dtype='object') 







































12.2 Zero Correlated 
features (PV - MV,DV) 

"A" stswl/^/ rsmb aststflda Qsnsmi^uj process 
parameter sissid GlarrsrrQsumi. training_data 
siap ill, (SarruLSl/D^m, $}[5& "A" -sinsu astni—d) 
column-^* ^Gtnsmggj dLfidasmi— rflrrevidc&, 
a_srrsrf/_/7d5 ^imiuuspid. i3mmij A-<*0ld ld/d/d 

parameters- d(&,LDrrm Q^,m—iji3ss)ssrd 
aisssrQ\i3in.^§], ^i$dl<sv 0 (o)<s/r(_/jf/_/ Qarrsm^msrr 
MV, DV -SDUJ /^)<®fflsi5)/_SiyLD. £§)/5)0 0.6 -<®0LD 
(^ssipsurrssi ^^rrsu§i 0.1, 0.2, 0.3, 0.4, 0.5 

Slap ILL, LD £ 5 ) LVL/<95 <5316)711 QurrpQisbsn columns 
/§ <*<95 U U @1 ® 6377D SOT. 


https ://gist. github .com/nithyadurai8 7/ 

e0cca6ec864405a032888244122a90d8 




import pandas as pd 

import matplotlib.pyplot as pit 

import numpy 

from sklearn.linear_model import 
LinearRegression 
from sklearn.model_selection 
import 

train_test_split,cross_val_score 

from sklearn.metrics import 

mean_squared_error 

from math import sqrt 

from sklearn.feature_selection 

import RFE 

from sklearn.datasets import 
make_friedmanl 

df = 

pd.read_csv('./training_data.csv 

) 



# Dropping columns which has 
correlation with target less than 
threshold 
target = "A" 

correlations = df.corr() 

[target].abs() 

correlations = 

correlations.round(2) 

correlations.to_csv('./ 

PV_MVDV_correlation.csv',index=Fa 

lse) 

df=df.drop(correlations[correlati 
ons<0.06].index, axis=l) 

print (df.shape,df.columns) 

df.to_csv('./ 

training.csv',index=False) 


OlijevidamssT QsusiflidCti: 




(20, 18) Index(['G', 'H', T, 'K', 'M', 'N', 'P', 'Q', 
'R', 'S', 'T', IT, 'V', 'W, 'X', 'Y', 'Z', 'A'], 
dtype='object') 

(20, 17) Index(['G', 'H', J’, 'K', 'M', ’N’, 'P', 'Q', 
'R', 'S', T, IT, 'V', 'W', 'X', 'Y', 'A'], 
dtype-object’) 


■.■1 

■ 0.141 

2 

0.32' 

3 

0.84 

4 

0.62 

5 

0.06 

6 

0.78 

7 

0.14 

3 

0.87 

9 

0.85 

10 

0.14 

11 

0.4 

12 

0.33 

13 

0.16 

14 

0.87 

15 

0.79 

16 

0.81 

17 

0.05 

18 

1 























12.3 Recursive Feature 
Elimination Technique 

RFE technique ot&sjtti/ ^smLpdauuQLi). 
Randomforest, Decisiontree, Adaboost, 
Extratrees, gradient boosting Qunmp 
algorithms ^rr&srrraQsu features-®*? (S^/rsiy 
Q&iliilild ^snsijd^) fzl/om Qu/ogu eiShs rrmi^Lb. 

linear regression, ridge, lasso, 
elasticnet QurrsirjD algorithms-digi 
techniques qpgvld rsmb prreir features-®? Cadg/rsi/ 

Q&lLlgJ <oLlLplhJ&t (SsU<5W@l£). flj/ lLuLD nm§] 

algorithm-®*? s-errsifi—nau Qufonjjd 
Qarnsm®, spsuQsnrTq^ feature-<s 0 i£> ranking-®? 
GULfiihiig&ljDgj.. rank 1 Quro^isbsrr 

feature-®*? ldlIQlL < 3 < 5 / 76 iy Qifiuaj tr,nih 
uiusiru($\£& > eonLb. 


https://gist.github.com/ 

nithvadurai87/34ca5b0e8a9f5908276240eb099 

247ad 


import pandas as pd 

import matplotlib.pyplot as pit 

import numpy 

from sklearn.linear_model import 

LinearRegression 

from sklearn.tree import 

DecisionTreeRegressor 

from sklearn.model_selection 

import 

train_test_split,cross_val_score 
from sklearn.metrics import 
mean_squared_error 
_from math import sqrt - 






from sklearn.feature_selection 
import RFE 

from sklearn.datasets import 
make_friedmanl 

df = 

pd.read_csv('./training.csv') 

X = df[list(df.columns)[:-1]] 
y = df[ 1 A'] 

X_train, X_test, y_train, y_test 
= train_test_split(X, y) 

regressor = 

DecisionTreeRegressor(min_sample 
_split=3,max_depth=None) 
regressor.fit(X_train, y_train) 
y_predictions = 



print ("Selected Features for 
DecisionTree",regressor.feature_i 
mportances_) 

# RFE Technique - Recursive 
Feature Elimination 
X, y = 

make_friedmanl(n_samples=20, 
n_features=17, random_state=0) 
selector = 

RFE(LinearRegression()) 
selector = selector.fit (X, y) 
print ("Selected Features for 
LinearRegression",selector.rankin 

g_) 


feature_importances_ gts^ld method, 

decisiontree-sw lS&j Q&ujgvulL®, 





features-*0LD/7S37 ranking-go 
Q<M6rfluu®£$£liLi<sn<srr6tr>£d airrsmscmb. ^mrrsb 
g)/ 5 <s method, linear regression lS§i 
Q& iLKsvui—ngj. srmQsn RFE ^Lpsctb ^mb^rTssr 
ranking-go QGUGifluuQpgjLimv)] Q&iliuj 
(S susmt^Lb. i3mmij ^j^l<s61(§rB^i Rank 1 
QsusifluuLl(^mm features-g» ldlLQld Gfs/jiay 
Q&ujg] UlUSSTU so nil. 

rfiijepidarresr Qeu<sfi\u$(H\: 


Selected Features for DecisionTree 
[9.52359304e-04 0.00000000e+00 
0 .00000000e+00 

0 .00000000e+00 0.00000000e+00 
6.15147906e-03 2.23327627e-03 
7.70622020e-02 

0 .00000000e+00 0.00000000e+00 



1.10263284e-03 2.33946020e-04 
0 .00000000e+00 0.00000000e+00 
9.12264104e-01 0.00000000e+00] 

Selected Features for LinearRegression [ 1 1 
10 1198352671 
114 1 ] 



13. Outliers Removal 


Outlier sTssru^j ld/d/d <5 rjei/ a eifl stS) 0 /70/ 
(SsuffyuL.® &><5bsfil ^]0<®0ld <s/rsiy ^0 ld. 
5,10,15,20...75 srgrwLD i£>^iui3ss)md 

Qarr6mLq-(i)d($LD <s/rsiy Gurflsn&aGifiiGV sprnQjD 
spsirgu ldiJ-QlL 15676 srmnb <arsms^ismd 
QairTsstsru^(r^ui 3 m, ^^i(Ssu outlier ^(§lL. 
^)<53i<s<5 <s/7®sr rsmb asmi—jfilrBgi aisnsmu 
(Ssusm(^)Lb. 

dLfidasmi— 3-^rrrrsm^^sv, asbsfft—nai ssbsn 
(S<g5/nj/_5)/i)0srr ^)0 <s0ld outliers s^si/(o)su/T0 
column- su/ld aGmi—rfihuuuiJ-tji) ^sdsu s^0 

GUGS)rjUUI—LDlT3i QsuSlflUUU (5)®<5OT/D<5W. 



boxplot ^Gvevgj violinplot 

UILIS5IL1 (/j) ® SSTfD ®ST. 


https://gist.github.com/ 

nithvadurai87/1756b2a5ec421fc3f36add04909 

cc517 


import pandas as pd 

import pylab 

import numpy as np 

from scipy import stats 

from scipy.stats import kurtosis 

from scipy.stats import skew 

import matplotlib._pylab_helpers 

df = 

pd.read_csv('./14_input_data.csv' 

) 

Fin di ng o u t li e r in da ta - 






for i in range(len(df.columns)): 
pylab.figure() 

pylab.boxplot(df[df.columns[i]]) 

#pylab.violinplot(df[df.columns[i 

]]) 

pylab.title(df[df.columns[i]].nam 

e) 

listl=[] 
for i in 

matplotlib._pylab_helpers.Gcf.get 
_all_fig_managers(): 

listl.append(i.canvas.figure) 
print (listl) 



j.savefig(df[df.columns[i]].name) 

# Removing outliers 

z = np.abs(stats.zscore(df)) 

print(z) 

print(np.where(z > 3)) 

print(z[53][9]) 

dfl = df[(z < 3).all(axis=l)] 

print (df.shape) 

print (dfl.shape) 


listl sTmu&>rf)(& ) sb spGuQsurTqj) cohimn-digj rofrasr 
snss)rjui—fmai<sb (S&LfildauuLlQdlsiTfDGxr. 

print(listl) 


[<Figure size 640x480 with 1 Axes>, <Figure 
size 640x480 with 1 Axes>, <Figure s 





ize 640x480 with 1 Axes>, <Figure size 
640x480 with 1 Axes>, <Figure size 640x48 
0 with 1 Axes>, <Figure size 640x480 with 1 
Axes>, <Figure size 640x480 with 1 A 
xes>, <Figure size 640x480 with 1 Axes>, 
<Figure size 640x480 with 1 Axes>, <Fig 
ure size 640x480 with 1 Axes>] 

LSswsw/jsavefigO spshQsnrrq^ column- 

d^LDrrm snss)rjui—qpLb ^fa,m Quiufl(S<sv(Siu 
(H&LSldauuQdliBgi. dipdasdsrL- 
ui—imansfilsb ^i—g] udsab ^(rjjuugi 'salePrice' - 
d&>[T(o5T Violin plot ^0 LD. SUS V&J ud&LD 

<^l(rf)UU3)i column-aa/r&srbox plot 
^Gurorfileb sj^rrsui^j epsmsn/DU uiumuCjq.d,^ 

outlier ^)0<®0 ld £Dz_<s<s3i<5 pm-h 
QarrsfrsYTsvnLD. £§)/ai0 SalePrice-su 3OOOOO-<®0 
(Hldg^ild lOOOOO-<®0 d(L£>Lb outlier ^q^uu^rra 
Q sn sifl u u (j))£d>l iL/ mmgj. 



SalePrice 



= gy(/j)< 5 < 5 L/i<j_ujfr<s <S<5<531<95UJ outliers-g> (oishsunQ] 

/f< 950 su^/ Gisiigy unijdaisomi. Z Score, IQR 
Score QurrmrDGtnGu ^)<$/d<s/ 7 <*l} 
uujmuQ&imrDm. Z Score otswl/^/ s ^0 <$/rsiy 
cgi/jS/Da/rssrmean i£>^iui3s6\q^^§i CTsi/susyrsiy 
gjljrjLD <$<srr<syf) ^(rrjddljDgl 

6Tmu<s$)&>d asmddlLLQd &n.gi]LD. <sn$£iai ^sirsiy 
<$srrsyf) £g) 0 uusup ssi/o rsmb outlier-^* s 7 iJ)<s 0/*5 
QaimshmsoiLD. 













0 OT<SOTU<5<SD<SOT mean-^s; stnsijpgidQanGm®, 
^^Us 61 (f^rB^i spsijQs urrqj) &,rjGijLb <orr5& cgyOTSiyagj 
^srrs/rl s-srrmgj otottu ^/ iSlsiisucr^LDrirril. 


print(z) 

[[0.65147924 0.45930254 0.79343379 
0.31172464 0.35100032 0.4732471 ] 
[0.07183611 0.46646492 0.25714043 . 
0.31172464 0.06073101 0.01235858] 
[0.65147924 0.31336875 0.62782603 . 
0.31172464 0.63172623 0.74302803] 

[0.65147924 0.21564122 0.06565646 . 
1.02685765 1.03391416 0.23194227] 
[0.79515147 0.04690528 0.21898188 . 
1.02685765 1.09005935 0.23192429] 
[0.79515147 0.45278362 0.2416147 ... 
1.02685765 0.9216238 0.2319063 ]] 



QsUjQlLD GyLD/D&SmL- L£> SOT SOT LD/_!_(/)) LD 

GtnGupaydQamGm®, outliers-®p Q&msb<o61 si 5 ) l _ 

( Lpisj-iurTgj. ^< 5/00 s ^0 threshold-^? ^sotlo** 
(Ssus3ot(5)ld. Qurr§]surra, 3 stsotl/0/ threshold-^* 
^snLDiLjLD. 'Sypmsuati 3 -<*0ld (Slusv ^srnsufl 
^(frjuustnsu STsbsorrLD outliers ^,0 ld. stsotGsiv 
£|)/ 5<5 0 Utliers-g> ldlLQld print Qa : iusu^joaim5m 
&>lLl _S 3 ) sot LSlsmsuQ^Lurrgu. 

print(np.where(z > 3)) 

(array([ 53, 58, 112, 118, 151, 161, 166, 178, 
178, 185, 185, 

185, 197, 224, 224, 224, 231, 278, 304, 309, 
309, 313, 

321, 332, 336, 349, 375, 378, 389, 440, 440, 
440, 473, 

477, 481, 496, 496, 496, 496, 515, 523, 523, 
523, 527, 



529, 533, 581, 585, 591, 605, 608, 635, 635, 
642, 664, 

691, 691, 691, 769, 769, 798, 803, 825, 897, 
898, 910, 

1024, 1031, 1044, 1044, 1061, 1169, 1173, 
1182, 1182, 1182, 1190, 

1230, 1268, 1298, 1298, 1298, 1298, 1298, 
1298, 1350, 1353, 1373, 

1373, 1386], dtype=int64), array([9, 9, 9, 3, 9, 
9, 6, 8, 9, 3, 5, 9, 3, 

1, 2, 9, 9, 9, 3, 6, 9, 9, 

9, 1, 9, 9, 0, 9, 9, 1, 2, 9, 9, 9, 9, 1, 2, 3, 9, 9, 1, 

2, 3, 9, 

2, 0, 8, 9, 9, 6, 3, 3, 5, 6, 8, 1, 2, 3, 3, 5, 3, 5, 8, 
5, 2, 5, 

2, 5, 1, 2, 8, 3, 5, 1, 2, 3, 8, 5, 3, 1, 2, 3, 5, 6, 8, 
5, 3, 1, 

2, 5], dtype=int64)) 



( HLDpasmi _ Qsusy?)ujL/jj_ai) arrays() 

s-sbsn^s<s)^,d aisnssfldayS^LD. ^psir (Lppsv 
array()-si> outlier ^imujfh^isrr&n row 

ld^Iulild, ^jpsmi—rrsu^i array()-si> <sn&,m 
column-LD^)u/_/LD arTGmuu®Lb. stsotCSsu 
print(z[53][9]) srsars Qan@d(§Lb(l>unf& l ] 53-si i§j 
row, 9-su§] column-si) ssbsnz core ld^ljl/ 
3.647669390284779 <nm Q<suGff}uu($\GU6$)& > d 

■3,C16MfGOnii>. 

_ dhu rr&> 3-d(&,d Stp 3-<sbsn LD^uL/asyr 

ldlLQld spQjj ip^liu dataframe-si) (S&LfildauuLl® 
^snsuQiu outliers /§<s<s(j z_/l_z_ <5/76iy<s6V777<s 
(S<?u5) <* <55 u u (J) ® mtD m. 


dfl = df[(z < 3).all(axis=l)] 



ctsotSsu uss)ipuj dataframe-si) 1460 rows 
^)0 l}/_/<s3i<5U7ld, dataframe-si) 1396 rows 

£|)0&n<suyLD anrsmisvrTLb. 


(1460, 10) 
(1396, 10) 



14. Explanatory Data 
Analysis 


rBLD&ti grrsijasrr srsusurr^j ^s^LDrBffjsbsrresi srm 

eiSIfi surra ^r/rriurB^i urrrju uGt, Explanatory 
Data Analysis =§/, 0 u>. 

14.1 Univariate 

spGrj s ^0 column-si) ssbsn ^rjsija&nm ldlLQld 
<st($\£§i ^pmusugi univariate sTswsiyLD, ^rressr® 

column-si) 2 _ 6 v 7 ws 5 ) 6 u srsi/si5)<s<s^)si) 
spmGpirrQi—rrsiijQ] Q^rri—iji3ss)m 

sjtou(^)^^iSlssrpssr srm ^rjrrujsu^i bivariate 
STswsiyri), usvGsuqi columns ^jsmsmrB^i 
srsusurr^] gj 0 target column-sir lS§i &,[r&ai£s$)&, 



sjpudp&jdp&j <oTsmj urrrjuugj multi-variate 
analysis srssrsiyLD ^stnLpdauu(ji)Lb. 

histogram, Density plot ld/dt^ld box plot 
^diussisu univariate analysis-*^ QuiflgjLb 
s-psijdmp su<s3i/jr/_//_ sustnaasrr 
Histogram tsimu^i @0 variable-si) 
ssrrsrrsupstnp, usbQsup] bins-^arj 

spsijQsurr(§ bin-sjj/LD srsijsurrpj prrsyasrr 
^s&LDrs&jsbsnm srsirustnpd 
arrLiQdlpgi. dLpdasmi— 3-^nrrsmp^lsv, 
'GrLivArea' srgn/ti column-si) usvGsupj 
sin_l-tff_gnjs3r/_m sqft ^gysyrsiyasir 
( 0 )< 95 fr(J)< 95 < 95 ul/lI® sirsrrssr. ^stnsu 500, 1000, 1500 
... 3000 sign ild usuGsu/ry bins-^au 
i5hflda,uuLL($\, s^ si/(o)suf70 bin-sy ld srppstnstrr 
si?®< 95 sir cgi/ sd LDrsgi shsrrstrr srsirugi sussirjui—LDnad 

&>[tlLl _ Uu lL(J)) sirsrrS)] . matplotlib ldppild 

seaborn ^dhusnsu ^ppstnauj susmrjui—rhiastnsrr 



<su LpiEi(gj&lGtiTjDGtfr. Histogram Gimuaj matplotlib 
sutp/E/0®<s377D GUG^rjui—QLDssfisb, Densityplot 
GTGSTU^I seaborn GUL£[E1(§&)GST/D 6U<S31/77-/f_LD 

dg/,0LD. 

Boxplot <ordru§]LD spGrj spQij variable-go analysis 

(olffuisujS/D^ a_<ssiyLD sp(§ snsorjUL- Gusoa 

^^ld. $}@>l<sv (spqjj QulLis). Gurrmro ui—LD spsirgu 
&rrsmuu($\Lb. ^)<s<S5r /5®siS)ai) asben Gan® ^rrm 
median ^gjLD. QuLLi^d^ Gldgvild. 

S(L£>lc> 3-<sb<srr Gan®, sr^a, g>rjGi]a<sb 

U[JG)S]lLIGbGn§l 6TS5tUG$)&,& anLl®LD. <S> // 5 < 5<95 

GanLii^m GmvsisxsvGmuiLiLi) a,nsssru\. ^liianiEi^ 
ansmuu®tb spq^ !dl<sv Sdlrfihu l/< srTG/rlaG<sn outliers 

^,@LD. 


https://gist.github.com/ 

nithvadurai87/5be067164741348c6a51d6af6d 

8d78b7 





import pandas as pd 

import matplotlib.pyplot as pit 

import seaborn as sns 

df = 

pd.read_csv("14_input_data.csv" 
df = df.fillna(G) 
df = df[:100] 

y = [i for i in range(0,10)] 
fig = pit.figure(figsize=(8,6)) 
ax = fig.add_subplot(111) 
ax.set(title="Total Living 
Sq.Ft", 

ylabel='No of Houses', 
xlabel='Living Sq.Ft') 
ax.hist(df['GrLivArea']) 



sns.distplot(df['GrLivArea'], 
hist = False, kde = True, 
kde_kws = 

{'shade': True, 'linewidth': 3}) 
pit.savefig('DensityPlot.jpg') 

fig = pit.figure(figsize=(8,6)) 
ax = fig.add_subplot(111) 
ax.set(title="Total Living 
Sq.Ft", 

ylabel='No of Houses', 
xlabel='Living Sq.Ft') 
ax.boxplot(df['GrLivArea']) 
pit.savefig('BoxPlot.jpg') 









Total Living Sq.Ft 











Total Living Sq.Ft 



14.2 


Bivariate 













^jUsmQ) variables stgus ungn (o)<s/n_/7L/ 
QarTGssrQmmm ctsw susot / tl/lld snss)rjg§] 

urrrjuu&tj bi-variate analysis sHy^w. £|)<s&srX- 

^l&fflsv spmtQiLD Y-^&ffUsv LDfdQforrsisinjjLD 

6316U<95d£/ 6U631/J7_/Z_LD SUSS)glU UU ® LD. 

$)rii(3j epshQsurrq^ sfi Liiyas)! 6S)i—Uj sqft 

^sns^isnu QurTry^gj aS)ri)usis)m eSlstsvsv 

srsijsunrgu LDrrrry u(Y§!ijogti sisiiuiy scatter plot, 
heatmap ^Qius^nsu qpgvld amli—LjuLlQmmGtsr. 
HeatMap-su ^psm® sus^rjuui—tmaish ssbsnm. 
spmgu seaborn suLpiEi^SlsirjD su&ngui—LdrraG^Lb, 
LDroQprrmiQ] matplotlib suLprai^Slsirp 

GU6$)rjui—LDrra,Gi]Lb ssbrngj. 

Scatter plot lormu^i ^gs^ansb ^q^d^tb 

^)(_<5 <st ><5 <5<S5f)<5<5®sf) Li<sn<s/rla<srrrTad &nL-($\Lb. 

prjsyaswsrrd (grfiluLSlQGiipibqJ) 

L/ayrafl<95(6)3<950 L/^)6U/7<95, dlgudlgy 



suLLi—iiiSySSxsnQujn^sbm^i (Ssug^j @l<sv 
Guuf-GurhiaistnsrrGiurT <5&lL_ uiumu($\£&>Gomb. 

Heatmap tndtu§i 2 dimensional data-snsu 

GUG$)[jrB&)] -SfTLL 2_<S5SiyLD 6U<531 IJUUL- Gl]S$)& 

^)/si0 12*12 ld^Iuli Qarrsmi _ 

GUGS)[JUI—Lb <oU (o3) jJUJLJU lL (/)) <oYT<otT&j] . MatriX-<SL> 
B-SbsfT SpsijQsUIT(§ 9,ssfl^sisi] LD^UL/LD &>65{)£& > Stsf) 
rfl/Dggrrsv (§rfiJdauu®Lb. Quirgisurra rsLD§] 
&,rjGij&6b 0Tsi/si®<s<s^)su ^ss)LD^§]<sbsnm sissid 

arrsm 2 -&,sijLb. seaborn LD/ii/p/Lc matplotlib 
suLpriii^SlsirjD susai^ujfrswheatmaps 

^)/ e /@| Q&rrQd&uuLLQsbsfrm. 


https ://gist. github. com/nithvadurai8 7/ 

d93a853d86cf5500011cb41308ddl935 




import pandas as pd 

import matplotlib.pyplot as pit 

import seaborn as sns 


df = 

pd.read_csv("14_input_data.csv") 
df = df.fillna(G) 
df = df[:500] 

fig = pit.figure(figsize=(8,6)) 
ax = fig.add_subplot(111) 
ax.set(title='Living area vs 
Price of the house', 
xlabel='Price', 
ylabel='Area') 

price = df['SalePrice 1 ].tolist() 
area = df['GrLivArea'].tolist() 
ax.scatter(price,area) 
pit.savefig('ScatterPlot.jpg') 




df2 = pd.DataFrame() 
df2['sale'] = df['SalePrice'] 
df2['area'] = df['GrLivArea'] 
fig = pit.figure(figsize=(12,12)) 
r = sns.heatmap(df2, cmap='BuPu') 
pit.savefig('HeatMapSeaborn.jpg') 

fig = pit.figure(figsize=(8, 6)) 
ax = fig.add_subplot(111) 
ax.set(title="Total Living 
Sq.Ft", 

ylabel='No of Houses', 
xlabel='Living Sq.Ft') 
ax.hist2d(price,area,bins=100) 
pit.savefig('HeatMapMatplotlib.j p 

g') 




Scatter Plot 








HeatMap - Seaborn 
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HeatMap - Matplotlib 
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14.3 Multivariate 


^jUsm(^)d(^Li (HldidulLl- LD^UL/assisiru 
Qurrgu^^i taraget variable srsi/cump/ 
^s^ildSIjd^i loTGsrd amsmuCUp multi-variate 
analysis ^©ld. Parallel coordinates <stgsiu§i 
£§) <5<5S51<95 uj multi dimensional data-ssisi/a 
« rnsmu <5/D0 a_<5siy ld sussi/jl/l sus 5 i<s ^gjCD. 

g)/E /0 plotly LD/Dgi/LD matplotlib qpgvld 

£|)<5<5S51<SUJ SUS51/ri_/Z_/E/<SS)T SUS51/J75Ji/ 

«l.(_ uuL-<$\sb<sfr§]. 'SalePrice' srap jld 
categorical variable-jig, s, rr si/tf, dr srsi/surr/p/ 
dfjrraiu urredUL/drsirr^i stsotl/ssijS £|)/5<5 sussi/tljz_ld 

<95/nJ.(J)i£). £|)S51<5 S51SU<95g/ $}$d) ^Jg/TSUg/ trend 

a_srrsyr<s/7 STS5rus5i<s rsmb aGmi—rfihusvrrLb. Plotly 
QpsvLD sussi/ruycD (Surr^i, spdjQsurrq^ column- 


wild s-srrsrrmin ldjo^jld max LD^UL/assaisir 
range-^a QairrQidayijuL-Qisbsnss)^, 
asuGsfld&GijLD. $][5$ susnrjut—Lb spqij html 
Ganruurra interactive (Lp&sipu^lev 
(S&LfildauuQSljDgl. 


https://gist.github.com/ 

mthvadurai87/2b0bb469694d33c7dl472880fl 

0f67el 


import pandas as pd 
import matplotlib.pyplot as pit 
from pandas.plotting import 
parallel_GQQrdinates - 






import plotly 

import plotly.graph_objs as go 
import numpy as np 

df = 

pd.read_csv("14_input_data.csv") 
parallel_coordinates(df, 

1 SalePrice') 

pit.savefig('ParallelCoordinates. 

jpg') 

desc_data = df.describe() 
desc_data.to_csv('./metrics.csv') 

X = df[list(df.columns)[:-1]] 
y = df['SalePrice'] 

data = [ 

go.Parcoords( 

line = dict(colorscale = 

I lrvf I 



showscale = 

True, 

reversescale 

True, 

cmin = -4000, 
cmax = -100), 
dimensions = list([ 

dict(range = [1,10], 
label = 

1 OverallQual 1 , values = 
df['OverallQual']), 

dict(range = 

[0,6110], 

label = 

'TotalBsmtSF', values = 
df['TotalBsmtSF']), 

diet(tickvals = 

[334,4692], 

label = 

'IstFirSF', values = 



dict(range = 


[334,5642], 

label = 

'GrLivArea', values = 
df['GrLivArea']), 

dict(range = [0,3], 
label = 

'FullBath', values = 
df['FullBath']), 

dict(range = [2,14], 
label = 

'TotRmsAbvGrd', values = 
df['TotRmsAbvGrd']), 

dict(range = [0,3], 
label = 

'Fireplaces', values = 
df['Fireplaces']), 

dict(range = [0,4], 
label = 

'GarageCars', values = 



dict(range = 

[0,1418], 

label = 

'GarageArea', values = 
df['GarageArea']), 

dict(range = 
[34900,555000], 

label = 

'SalePrice', values = 
df['SalePrice']) 

]) 

) 

] 

plotly.offline.plot(data, 
filename = 

'./parallel_coordinates_plot.html 
', auto_open= True) 





OverallOTialalBsmtSEstFIrSFGrLivAreaFullBaCbtRmsAbvBrdpla* 
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15. Polynomial 
Regression 


s£0 <3/5/7 QanLLuj-sb Qurr(rjjr5g,rT&> fffDQ] ^idaeorrm 

^/7sv<*{sy5<®0 polynomial regression-®pu 

uiumu($\£& > 6V[TLb. §>L£dasmL- rflpsfilGV Gpqjj 
sfiLLiq-rDarrm ggjrj tSmq-iLjLb, ^^roaum ®S}<s$) sou-jib 

Q&rTQdanjuLLQsbsfTa)]. linear ld/d/t}/ld 
2nd order, 3rd order, 4th order & 5th order 
polynomial Qunq^^^iu unijd&u uQSljDgi- 


https://gist.github.com/nithyadurai87/ 

b7d3bf7733b5d4a8d2c8b2dlb8dcb531 





import pandas as pd 
import matplotlib.pyplot as pit 
from sklearn.linear_model import 
LinearRegression 

from sklearn.preprocessing import 
PolynomialFeatures 

X = 

pd.DataFrame([100,200,300,400,500 
,600],columns=['sqft' ]) 

y = 

pd.DataFrame([543543,34543543,354 
35345,34534,34534534,345345],colu 
mns=[ 1 Price']) 

lin = LinearRegression() 
lin.fit(X, y) 

pit.scatter(X, y, color = 'blue') 
plt.plot(X, lin.predict(X), color 



pit.title('Linear Regression') 
pit.xlabel('sqft 1 ) 
pit.ylabel( 1 Price 1 ) 
pit.show() 

for i in [2,3,4,5] : 
poly = 

PolynomialFeatures(degree = i) 
X_poly = 

poly.fit_transform(X) 
poly.fit(X_poly, y) 
lin2 = LinearRegression() 
lin2.fit(X_poly, y) 
pit.scatter(X, y, color = 
'blue' ) 

pit.plot(X, 

lin2.predict(poly.fit_transform(X 
)), color = 'red') 

pit.title( 1 Polynomial 
Regression') 



pit.ylabel( 1 Price' ) 
pit.show() 


linear regression-^ Qurrqjjg&ijLD 

(Sun^i, ^^pjarrm Gtarr® ot / 5<5 <sp(ij 

l8§ild Qurrq^^9,[TLDsb i3msn(f^Lcrr^] ^GtnLD&ljDgl. 

^]§](Ssu under fitting stsotl/l/iJild. 


ie7 Linear Regression 



sTswCSsu 2nd order (Lps^ipuSlsv ^&,mis<s)i—uj cube 

ansssrQ\i3u\.dayijuLL<^\ ^surosnp) <s/76iy<s(6)Tjf_<sOT 
Qurrqjjgg, (Lpiu6£jLb(I>un& t j i3mGU(rjjLDngy 










<316 DLoffl/Djy. £§)jy(3 su non-linear function 
6T6ZrUU®LD. <3!&>rTGU§l £§)#i/ 6£0 (3/5(7 (35>/7/_f75> 

<Si<5S)LDUjn§l. 



cgysi/si/frCS/o 3rd order-si) .s/rsiyats/j&nmj cube 

a6tssr($\L3iq-&s,uuLL($\ cgi/smsu <s/7siy < ®(25 < ®0 

£|)65765)/LD c ?l/0fflsi) (o)<Jsi)617631<5<® <95 (7655761) (7LD. 














_ <ahu rra> 4th order-si) a,rj<sija<Gdlm 

lE&jld (Lpajj^mhu Qunrrijr^LDrriry non-linear 
<3)6tr>LD&)iD&il. £g)jj/<3su over fitting srsir/p/ 
< 3 )Gtr>Lfid&uu®Lb. ^giGiurTGsriD over fitting-LD 
atfliLmmaj <s>)<sv<sv. 

stsotGsv st/ 565 order-si) ^Gsxssr^aj^ ^usi^ans/rlsir 
lS§]ld, [BLDgi non-linear urjGusvrr&u 

Qurr(§fB^iSlp(S^fT (over fitting ^gysusu/TLosi)), 
^ss)a,Quj rsmb as^sfluLSlp^ srQ\^§]d 
QamsrrsrmvrTLb. ^]Lb(Lps^ijDuSJ<sv spqj) <oTsm&ssfliD(^ 
LDi—di(§am 

asmQ\i3i^d3>uuQ\sua,rTsb, ^g/D&msxr 
&LDmun($\ ^a,m LDi—fhi^assxsmj Qurrgupa,] 

LSlGSTGlIQJjLDrTV}] ^G^iLdS}JD^I. ^>)mG>Sl<SV 

GTsmam^^lafldauu(^Gu^a<sv feature scaling- 

m uiusirua($) £g)/5/0 <s >/£?>)<* (LpddhupgjGiJLb 
Qugydlinati. 



h(x) = o o + 0 ] x + e 2 x 2 + e 3 x 3 ... 


15.1 Underfitting - High 
bias 



LJ[Tfotfr Garu—rrmatf &>[jGij3i<5film lS§] 

< 3 t$ 5 laLDrTau Quxq^rB^x^ rffs^isvQiu underfitting 

CTswLVLJfJlffl/D^/- cgi/srrsiy ^/rsiy<95(S)5<®0 

0<s3i/D^ features Garrem® a<5vsfld(§Lb (Sunrgj 
£|)/ 5 /£)salsa 67/DU(J)ffl/D5/. £§)ai/(36u high bias 

l 5)/7<?<?631637 6T63Tffl/LD ^9/631 LpdsiUU (^Sl/D^l. 









sjQmssflsb idUad 0<s3i/D^ cgysvrsiy ^Lb&rmaGsxsrrg 1 
<977/7/5(3<5 Q&UJ<5VU®£llB3)l. 3-&>fT[JGtS5r£&>!&(&) 

50,000 <s/76iy«(S)3<*0(m) ^psmGi— ^psm® 
features-g><* Qarrsm® a6vsfld(§Lb (Surr^i 

<5/76iy<956)T CT^/SiyLD (S<95f77_iLJ.ffli) (o) U f70/5<5 ITgJ. 

srmGsn (^jajGu am/i) i3rjda : s<s)md(g ) prjGijaisfilm 

GTsmsistfldstnasmu ^>/§ila;fluua 1 i ^jsuaaia^i. 
features- sir GTsmststfldstnasmu ldlGQGld 
< s> 1 §i)a;flda GsusmQLb. . 



15.2 Overfitting - High 
variance 



^syrsiy features-®?? (H&rjuu^Gir ^pecid 
underfitting-®»<s ajsdhjd^svrruj trrssr sjroQ^mQsu 
urrrfpCSprTLb. ^sh&jQgu ^snsijd^ ^^lanDrrai 
(S&rjggjsfilLLi—rTsv, overfitting otsot/o rflsmsv 

gjjdulI® ^)<5®rx537<s pGfiJrjLjupfDarra 

Q^rfdauu ®su(3^ regularization parameter 

^, 0 ld . tSq&.rTGu&j &>rjsij3i<5fi}m ’oTsdsrswf]dss)3i 








0 <s?n/osuf 7 'S £§)®/ 5 ®/, features 

0<950U)(SlJ£|)/5/£)<ST>6U §tidu($)ld. 

s-prrrjGmggid^ Qsuiqild 50 ^/rsiyats®*®, 250 
features Glarrsm® astssfld^LLGiungi (Sarri—nrGxrgi, 

cgi/smsOT®®/® ®/rsiy<95sy?)<55r l/?®/ld ^errsi^d^ 
cgi/^)<95LD/7<95/j (o)u/T0^®/®/D®/. ^)®/Gsu high 

variance CTsw/p/ ^s^iLpdauut^dljD^i. £|)<s&n< sot® 
®si 5 )/j< 95<95 features (oTsmsvsfldsm&smu l£]&gi]ld 
0 <s bidm nevi ld high bias ^dlsi 5)@®/d®/. 
g)®/<3si/ bias-variance tradeoff ctsot^i/ 
^stnLfidauuQdliDffii. £§)®/ (Surrsmp 
i3rjd&G$)ma,G$xsfr ®si5)/j<95<95 features 
(oTsmsttfldstnasmu pfiliurrm ^snsi^d^ (gsmjnda 
(Ssusmt^ni ^Gvsvgj regularization-®p/j 
uiusiru (/>)<?> ® sunrLD. 



15.3 Regularization 

spsijQsuiT(§ feature-siy/_CT/LD 
^GmsmdauuQLD parameter-<s3r(^5L_z_/7<95<*6yi)<s3r) 

^snsnsud (gGtn/DdSHjDgj. simGsu features- sir 

GTsmsttfldstna <s> /Si)<*7D/7<* £|)0/5<5/7<g^/7D, ^sdsu 

asttfluLSltsv 0®n/D/5^ ^snCSsu urhi(l>aiD(§LDrrtQ] 

Qffiuiusvmd. linear regression-siy /_m 
^smsmiL/LD (Surrgj, <sn&>tf)3irTm &LDmurr($\ 
lSsotsu 0LDfrgi/ ^&dldS}jd^i. 

Linear regression: 



(Mx (i) )-y (i) ) 2 + Ai;^ 

3 = 1 


$}$<sv <svrTLbi—n ^mu^i^rrm regularization- 
<®<g5/7<53r parameter. £D<*<s3r ld^iul/ 1 eSlq^iigj 
Qprri—iEid) ^ss)m^§] feature -d^tb 



cgy&nLD6U<S31<5<95 <95/r633T6iyLD (j =1 tO II). SjQsStSsflsb 

xO -sir ld^Iul/ STuQurr^iLD 1 srstsr 
^(fjjd(§QLDsirustn£ sjroarssrQsij asm(Si—mb. 

?6U { 5 L-LI—I TO -6iy<S31Z_tU LD |j) LV 631LJ <95 063UD<95<95<5 
(S<56316UU®6U6316ll. 

^(S^CSurrsv evradi—irsfilsir ld^Iuli lEk 

^$dl&>LDrr&>GHLb £|)0<95<S<95 6fo.(_f70/. U®<5<95 
0631/D6U/r<S6iyLD £§)0<95<S<95 <9kl_/7g/. 0631/D6Uf7<95 

<S)frr,ds,nsyi ld. overfitting-gi<5 ^sElijdarr^i. 
< 3 t$d)aLDrTa G^mmamsyiLD bias ^rroui—d 

.5/7776357X0/7® 61® ®LD. STSStQsu &flUJ fTSST ^STTsElsV 
£|)0<955 (S6U 633L(J)LD. 

Gradient descent-6i//_6S7regularization 

^stnsmiLiLbCHurr&ti, ^pjn&rr&sr ^LDsirurrQ) 
L®6376U0LD/7gl/ cgy631LD IL/LD. £|)/570LD |5L_7_/7 0 - 
6iy(_63L ^snsmiumnsv, ^5L_/_/7 1 -sEq^rB^i 

regularization ^snsmd&LjuQdlfDgj. 



00 := 00 - a -b £ {H x )~ y ) 


i=\ 


"I 

:= 01 - « £ (H x )~ ^ x + ,77 0| 

i=l 

in 

-»,(!-<*£) -oH: £(*(*)-»)• 


t=l 


0<531/D^ cost &>(oSST(J)\l 5)LQ-ijU&)ff)&>[T(Dfir &rT£IT[JGm 
rj^gl isst regularization ^ssxsmuiLbQufT^i, 
i3m<su(rjjLDnn)] ^ssildhild. 

Normal Equation: 



( 

rO 

l 



6> = 

X T X + X 

l 




V 


l- 

/ 








16. Logistic regression 


[BLDgl aGtfsflULI sp(§ (LP(L£> LD^IUl3sS)m 
QGus/rluuQ^rTLDGV, isjQs,&pi ld s £0 GUGtn&uSlm 

SLp <s>i< ss)LD[ 5 &rTGv, <s>i§jG><5ij logistic regression 
<oT6zruu®LD. £g)/j>a> GUGtnauuQ^GV, binary 
LDjDgULCi multiclass argn/LD £g )0 Gfilpihiasiflsv 
[BGvi—QurruLb. logistic regression <nsiru§] 
^)<s/r )0 a_<ssiy®(S 3 r/D s ^0 algorithm ^^ld. £D<s®sr 
Quiufisv LDLlSiLDtSfrOTr regression srmiLb 

eurnj{f,s!S)S) ssbsrratf. s^mnsb spqjj 
classification-<s<sf7w algorithm 




16.1 Sigmoid function 


s£0 s£ls^,ujLi> rsGini—QuguLDrT? rBstni—Qu/orT^rr'? 
^sbec§! £|)0<s< 95/7? ^]si)®nsuujf7? 6Jg$ius$)&,(3uj 
asvsflddljD^l. ■SHjid <simu§] 1 erswsiyLD 
^)SU<5DSU (oTsirUgJ 0 CTSWSiy LD aGVsfld&UUQLD. 










^aCSsu aGvsfluunsxr& t j 0 l-su&np 

<s>iG$)LDiLjLb. ^^.roairTm snss)tjui—LD i3msu(r^LDrriQ]. 

[5$ su<53i/tu/_<s^)su z -sir ld^Ius^iuu Qungyggj 
aGvsfld&uuQLb g(z), 0-(tp<s<si> l-susmrj^ s^ilihu 
G>GU 6m®QLDGvfl<sv <s>lgrr)&rTm (SifgfdlrjLDmm&tj 

l/(l+e**-z) CTswff 1 / ^tstuDiLiLb. ^j^iGsu sigmoid 
function OT&srgi/ ^GtnLfidauuQdl/Dgl. 

stsstQsu z-d&nm ^i—£$d)<sv h(x) -«pu 
Qurr^^^lssrrrsv, 0-1 susai/j ^stnLD<su^,ri)airTm 

&LCX5QTLJ [Tl _ [T&> l3mSU(f^Lb (&,££>.)pU> ^S^ILDIL/LC). 

£g)ji/<3su logictic regression-djafTsw &LDmurr($) 

^(g,LD. 

sp(§ u5]®sr<55r(sjd : siL) spam-^ ^evstneviurr srmd 
aGtstfluu^irjarTGisr rflrjev i3msu(f^LDrf^]. 


https: //gist, github. com/nithvadurai8 7/ 

f09984303f976ca6eb8a64a4b7f0e391 




import numpy as np 
import pandas as pd 
from 

sklearn.feature_extraction.text 

import TfidfVectorizer 

from 

sklearn.linear_model.logistic 
import LogisticRegression 
from sklearn.model_selection 
import train_test_split, 
cross_val_score 


df = pd.read_csv(/spam.csv', 
delimiter=',',header=None) 
X_train_raw, X_test_raw, y_train, 
y_test = 

train_test_split(df[1],df[0]) 



X_train = 

vectorizer.fit_transform(X_train_ 
raw) 

X_test = 

vectorizer.transform(X_test_raw) 
classifier = LogisticRegression() 
classifier.fit(X_train, y_train) 
predictions = 

classifier.predict(X_test) 
print(predictions) 


['ham' 'ham' 'ham'] 


16.2 Decision Boundary 

h(x) = 1 <nmu§] sruGiungjLb ^%lc> <ormuGS)&>Guj 
(§rfiJd(§LD. sjsstQsu l-h(x) £g)s vgmsv 



srmuGS)&>& (^fOid^LD. 2 _dg rrrjsmrd,§id(&) h(x) 
ossrugj rsrr gv> sn LDGS)if> Quiliuj 70% sumuuLi 

ssbengj ct<sot as^sflddlioQ^ssflsv, l8&>(lp sbsrr 30% 
£D<si><s3i<sii ct<sotl/<s3i<5<s <k stisfld,ayehsna# srmQro 

<S/Tsiy<*syr dLpdasmi— si/<s3i/7u/_<s^)<su 
airTsmuu^sn^iQurrsb urjsueorrai 

<3)6tnLD[5d>)(fijd&>)iD3il 'oTssflsb, CT<s/r )0 (Sldsv 
Qaemronsb ^%ld srmd asts^dascmd, ST&,tD(&)d Stp 
<3)6tsu£>r5&rT6v ^Gvstsvsv <oT6$rd aststfldaiGorTLb 

ct<sot/_/< 53 i <5 (tpLj-siy QatusuGg decision boundary 

^(gjLD. £§)g/ CTL/Gu/Tg/LL) ^LZ_f7 LD^IULjfBSSXSmj 

Qurrgn^CS^ ^svhdilild. -3, 1, 1 ctsot/ld 

LD^IU L7<5<S31<SOT ^LL/tO, ^LLLrl, ^LLl—l2 

<5T6V)iL£li—ddl<sv Qurr^d, £ 5 ) ssrrrsb, h(x)=l ct<sot 

«<S33f?LJLJ<5/D0 Xl LDJDgULD X2- = ^,<SOT^/ 3-<*0 (SlD0L) 

^stilduj (SsusmQ)Lb ct<sotlj<s3i< 5 decision 
boundary-^,* ^gy <»3i lo dg g/ <syr<sng/. 



Linear Classification 



h(x) = g(9o + 9\x\ + 0 2 x 2 ) 

— 3 X ^ “I - X 2 where () — 1 

L 1 

hence 

h(x) = 1 if .Xl + x 2 >3 


&,[jGg&6b SLpdasmisun^j non-linear (Lpsn/DuSlsv 

uuG 61 uSJ(§uu^fTsv, ^LLt—rr LD^iULiaisnnm - 

1,0,0,1,1 <sr<sZru§i 2-ld order polynomial-si) 
^)0<950LD ^LDmumiu^sb Qurrq^^uuQ\S\p)§i. 

l-srasTL/dy boundary-^* asm® 
i3i^danjuL-Q\sb<sfr§]. ^)^/<3sir threshold 
classifier srsOT/p/LD <s>)Gts)L£dauu®Lb. 








16.3 Cost function 

3-SS$rS$)LDuSl<sb [BfTGSXSIT LDSS)Lp QuiUUJ Sl//Tu}/jL/ 

£g) jd§ 1 GTGtsr aGmfldauuL- Qsusssn^.uj§i 

^GVGtSVSV <oT(o5f U lLl _ /7<Sl>, ^§1 SpQJ) errOI". 

^sijsurrQjo Quiiiiungi i5TmuGS)&> Quilul/ld GTmd 

aGtfsfld&neviLb sp(§ error. <^<5/161 i&j 1 srsiruaji 

0 <oT(o$T &>6titifld&>Lju lLl _ n(3<svn ^<sv<oV^j 0 sisstu§] 1 

Grm&sisrf}dauuL^i—[T(3Gon^&,Gts)iG$)t—iu psurru 
st£&,g$) m &&>G)fi&,Lb rflaLprBgiGrrsngi 6Jsirus$)&>d 









<*saw *£)/_!_(/))<95 <9^/d $juj<svn§j. Infinity 

(taTsmsmpp) 67S3/l/(3<9> ujS^iuuntf, 

^) 0 < 950 ld . <^&>tf)3irTm snsnrjui—rhiaisb 
l5)sotsu0 mnfgu. ^^Isv X stswl/^/ h(x) otssAsu, y - 
^ssrgj infinity-go QrsndSld Q&gvsvild 
sussxsnsi^daum (^^^irjib -log( h(x) ) 



^prrGijgj, 

1 <oT(o$TU&i] 0 GTGtST3iGtsrfl&aLjuLdl—[T<sb<S>l£rD3inm 

cost = -log( h(x) ), ^susurrQtD 







0 srmuff)] 1 <srm aGvsfld&uuLLi—msv ^gpj&nrm 

cost = -log(l-h(x)) 


stsstQsu COSX.- nm (3r,£$[jLb LSlsirsu(§LDiTgu 
< 3 jGtsu£>&)rr)§j. ^$<svy=l srsarsiy ld y=0 srmsijLb 
ssxsn^§] &filurTrj£& t ]d QaavsnsnsyLb. 


./= -^[E y {i) logheix^y) + (1 -j/W)log(l - he(x^))\ 
_ i =1 _ 


When y = 1, 

= y . log(h(x)) + (l-y).log(l-h(x)) 
= l-log(h(x)) + (l-l).log(l-h(x)) 

= l°g(h(x)) + 0 
= log(h(x)) 


When y = 0, 



= y . log(h(x)) + (l-y).log(l-h(x)) 


= 0.1og(h(x)) + (l-O).log(l-h(x)) 

= 0 + l.log(l-h(x)) 

= log(l-h(x)) 

^)<5/r)<g5/7<sOT contour plots spQrj spqj) dlsmsm 

SULSj.SU ^3jSS)LDLn3sb <3jStSUDlLISTLDGV, fflgjj &lf£] 

suss)snsij3yss)snu Qujd^j usv(S surru §tjdjd 
^jDdadiastnsnd G)ansmuj.q^d(SF ) Lb. ^syCSsu non- 
convex function srmuu($)Lb. ^j^rrsu^i 
regression-<*<g5f7(SW su Gtnrjut—^dZieb spQrj spqj) 
global optimum asTsmuuQLb. ^ssmsb 

classification- danm susmpui—^Gdsv usvQsugu 
local optimum ansmuu®Lb. sjQmssflsb £§)®j(3j 
'^svstnsv' sr guild psm® ld £ 5 ) UL/asyf 

ldlIQld LDrrr$l Limrfil asmfld&uuQsupn'sv, 



usvQsu/ru local optimums ^jQ^dSlsirjD&sr. £§)jy 
(SurrdrjD non-convex function-^/ ld /s/tld 
gradient descent-®p/j uiumuQggGvrrLb. 

£D<5/D<sf7®srgradient descent -m &LDmurr($)Lb 
multiple linear-g? sp£ G><s ^)0<®0 ld. spQrr 
si 5 l£$£liurT&LL ’oTmmQsussflsb, h(x) -<*<*nm 
^L(_fr-transpose.x Grmu^i £§)(h/ 0 sigmoid 
function-g><95 Qaaiemi^q^d^LD. 


16.4 Classification 
accuracy 

rBissxsiT 3-sms^iLDuSl(S<sv(Siu LD&nrp Quiliuj 
GU rriLlULI ^)0«0LD(Sufr^/ '£|)si)<S3)Sl)' ST6OT<95 
aGVsfluugiLb, ^jsvsvrr^CSuiT^i '£§)0<S0' STGtsrd 

asvsfluugi ld classification-si; rBGtni—QuguLD 

'/>SU fT)l 51 551 3 5U <51 (oil(oil(oiT(olj <5/7Siy<9>(SY7)6(>(^ 


&iflujrrm aGvsfluLi&m rff^Lprs^ishm^i <ormd 

<95<s3sri_rt5)suG<5 accuracy ^^ld. 


s£0 !-!&,&)] ^rrsisxsfTdairrm GurrssflGis)GC asvsfluLiam 
l&LfidasmL- a_<s/7/7<sOTr<5^)su anrsmuu(^su^iGurrsv 

ctsot <5DSU<s^y<® QairrsbQsumb. 
y_true -si> ssms^LDuSKSsvQiu Losairp 

Quiu^fr, G§)<5V6tn<SVlLirT STgn/LD Sl5)su/7LD 1 LD/rj/p/LD 
0 ^<95 ssrrmgj. ^pjn&rrstsr <95SOTf?LVL/<95syry_pred - 
si) ssrrmgi. ^gujogv/d <5pui3L-($\u 

urrijd(& ) LbQun& ) i ^pGmi—msijgi, ^ijorrsui^j 
LDfDjQlLD GJLprTGU^I «S3sf? U L/«S)T LD/J.(J)LD LDfTrfil 

rBGtni—Q uid]$(!!)uugv>£ aGussfldasyLb. stsotGsu 
(o)LDf7<s<s 10 <s/rsiy<95syf)si), 3 ldlLQ^lL ^suprrai 
<s>]Gts)LD[ 5 §>)(rijLjugmsv, 65 sw accuracy 70% stsot 

GurBgiGrrmgi. 



https://gist.github.com/ 

nithvadurai87/7668ce262ed9070d89bl58bb7f 

13c5cb 


from sklearn.metrics import 

precision_recall_fscore_support 

from sklearn.metrics import 

accuracy_score 

from sklearn.metrics import 

confusion_matrix 

import matplotlib.pyplot as pit 

y_true = [0, 0, 0, 0, 0, 1, 1, 1, 
1 , 1 ] 

y_pred = [0, 1, 0, 0, 0, 0, 0, 1, 
1 , 1 ] 


print ('Accuracy:', 









print (confusion_matrix(y_true, 

y_pred)) 

print 

(precision_recall_fscore_support( 
y_true, y_pred)) 

pit.matshow(confusion_matrix(y_tr 
ue, y_pred)) 

pit.title('Confusion matrix') 
pit.colorbar() 
pit.ylabel('True label') 
pit.xlabel('Predicted label') 
pit.show() 


Accuracy: 0.7 
[[4 1] 

[2 3]] 

(array([0.66666667, 0.75]), array([0.8,0.6]), 





array([0.72727273, 0.666666 
67]), array([5,5], dtype=int64)) 


16.5 Confusion Matrix 

i3m<su(rjjLb l / lj . 

a_ 0 SU f735 <95 U U (J) Sl/Dgl. 

0 srsmiLD ld^Iu Lj 1 stsot aststfldau u Lii—msv 
False Positive 

1 gts&iild ldS'iuli 0 form aGmfl&anjuLLi—neb 
False Negative 

1 srsmiLD LD^iui3m aststfl u l/ld 1 ot&st 
<SHGS)LDr 5 &rr 6 v = True Positive 

0 srsmiLD LbGZiui3m aststfluLiLD 0 ot&st 

<SH®s)LDr 5 &rr 6 v = True Negative 





Yes 

No 

Yes 

TP 

FN 

No 

FP 

TN 











16.6 Precision, Recall & 
FI score 

Precision (P) stsotl/^ ST<5<5<531<S3T < 5 < 5 <sif< 5 T£> < 5 su/d/t<s 

'^LD ST(S3T<5 <5S33/?<5^/S)TS1T^/ ST<S3TL/<531 <5 IL/LD, 

Recall (R) Gresrugi ST<5<5<531<53T < 5 < 56 i£ 5 T£> <5 sup ns, 

'^)fflL)S31Sl)' ST<S3T<5 <5S33/?<5^/STTSTT^/ ST<S3T(_J<S31<5(iyLD 
<5S33T<5ffl(J)ffl/D^/. <5S1T/D/T<5 «<S33/?<95<5 U Z_/L_Z_ 
£D<si/<Sl5)/T633T(/j) LD@)1 LILI&>(oft)<olTlLI LD (3&rj£§l Sp(S[T 

LD^iuunai td/t/d/t2/su(3<5 F score ^^ld. 

^)<5/D<5/TS3T (^<5^)/7t£) L5)<S3T61T0L£>™y. 

P = True Positive / (True Positive+ False 
Positive) 

R = True Positive / (True Positive+ False 
Negative) 


F score = 2 (PR / P+R) 


£D<s3isu< 95 ssxsnd asm® i3iq-uua>tf)arTs$r 
(LpdQiu^^isuLD sts&issi stsut^j ^uGungj 

LJn[J<9^<9y(oVULD. Q—&)[TIJ(ofitfT 6S g/ <5>0 6£) (77j 617 QJ)&>(&) 

2—l _ ldiSIsv sjd)ULL®sbsn aL-i^uSlssr ^snsnsuu 

Qurr(§^^i, i-l/ogu Q^mudarrssr aLii^iurr 

^svstnsviurr srm (tpuj_6iy Q&ilulild Qa^snssrsmu 

<°r®£®d GarrshCSsumi). ^pirjarTsxr LDrT$£lfl£ 
a,rjsi^asdlsb ^nrojSlsb spq^snq^d^ ldl1®(Sld '^%ld' 
srsniLD (tptij_siy ansmuu®LD. 
QuQjjLburTsvTsmDUjrTmprrsijaGiiflsv '^svstnsv' 

GTstniLD (ipi^Qsu r£lstn/Dr 56 )l(fijd(§Lb. (Surrsirgu 
spQrj (LpLq-G&smmd arrjj[B& <®®a ^snsy LD/7^)/f)<s 

gpsyastnsrrd Qanfsmi—msuQiu "skewed classes” 

srsirjD<mLpdaLju®d}sirjD<m. ^)su josmiD stnsup® 
i3rDarreo£$eb ssmsmDiurrstsT aLii^uSlsir 

^srrmsud asvsfld^LbGiun®, '^%ld' STssrua,[D(3) 
utdlsvrra 'g^susmsv' STssruss)a,Quj 

QuQ^LburrmmLDUjnra Qsusnfluu®^®Li). 



^GU/Dstn/od <s<sOTn_tf5)6i/<5/D0 a_<ssiysu(S<s 

precision ld/d/tulL recall ^©ld. 

16.7 Trading-off between 
Precision & Recall 

s£06n0<s3)(_uj aiLt^uSsOTcgi/syrsiy 5mm -<*0 (Sldgv 
^)0/5^/7‘<su ^0/ L-lfOgiJ (SpruudarTm gtgst 
threshold ^Gm£>daUULl®GrrGfT£rTa GtS)GU <50/<® 

(o)<95/7srr(S guild. ^uCHungi £§)/5<s = s>/syrsiy<*0 (Sldgv 
^G srrTsv &rrgn[jGm aLlaj. ^)0<®0ld s£06iy/f)/_i£> 

Q&Gsrgu L-IJDgu (S /5 mil da n gstG rrnd, 

gGUJDfT&d .tfi! SI^L-L/TOl), tSHGUtj 
(S^GmGijuSlGVGvrTLDGV ugv gug61l£ 1(§ dldldGm&aGViGrr 

QLDroQginGbGrr G>GiJGmis)-u51(fjjd(§LD (false positive 
- high precision). 

gt&stQgu /5ld<®0 2-fgjSilujnag QprflrB&rrGV 

LD/_!_(/))(3LD 'cf^LD GlGtSld &n.JD d>GlJGm®Li> 


sTmu&>rf)3irT3i threshold-®» 7mm -<*0 (Sldsv 

^uGyurTgj 6 mm ^errsiShsv 
LIJDgU (SrBITlLI <95L_UJ_ £|)0<S0LD S£06U/f)/_LD Q&sirv}] 

2 _ Ihl < 95 ( 25<950 SpStTgULD '^JeViSDSv' 6 Tm& On.gULD 

^urriuLb G>r5Qj)LD (false negative - high recall). 

£D<5<S3Tf7fflL> cgysiJ0LD c gi/6UL_£)ujLDf7‘<S ^)0/B0/ 

siS)®si/rr/j. 

^sGsn precision -g><® 0 <s mpda s^lq^LbiSIssmsb, 
recall ^^lanfld^tb. Recall-®»<® 0 <s 3 i/d<*<* 
®S)(rj)Lbi3mrTeb precision <^£ 5 ) 35 ffldfgLD. ^§j(]>£li 
trading-off between precision & recall 
GTsmjuQ&lfDgi. 



17. Multi-class 
classification 


0 LDJQfQ]LD 1 CT<53T^)0 l 5)/f)siy<95srr 

£D<si>6i>fTLD<siL>, usvCSsugu /_S)/J)siy< 95 srr ^q^ui3m, 
L/^)<Sf7<95 SU0LD S£ 6377)5) 65) 637 GTf5&> L3$l®S)m §>L£ 

< 3 ]Gtsu£>da (Ssusm(^)Li <srm aGvsfluuCSg multi- 
class classification ^^ld. £§)£?i)<si> sr^ssxssr 
i3$lsijg,<5b ££](§<&£)joG^rr, ^< 5.5 63)637 logistic 
<95S3sf?LVL7<95syr/ 5 ®nz_(o)Lj/ri/LD. i3mmij l/^)<s/7<95 

sn 0 ffl®s 77 D spsxrgu, ^s!S)mdS'imnsvi ld 
asttsfldauLnl®, sr^tsb ^dJaLimaLj 
(o)ufT0/5^/®/D(S<s/T, ^]f5&u iSlfisinGud 

(fl) 77637/1)65) 7_ IL/LD. 



&Lfidasmi— 3-£BrrrjGm£$6v £)<95lvl/, ssa^rr, udstna 1 , 
LDtroijgm <oTas)iii> p / 7 < sot 0 LSlflsi^ansuflsv GUGsxsmurhiasb 

s-sbsnm. 


• © 

• o 
0 

• 

• • 


© O 
000 
o 00 



( Lp^&SlsV £) <95 LVlS <S31<SOT<95 <95<S33/?UU<S/D<95fr<SOT 

hypothesis s-(r^surrd9,uuQ]Lb. ^)^)suh(x) = 1 

GJS51U&)] §\&Ln3s$)md 0 / t 5 )< 950 ld . SdlauLi ^svsvrr^ 
^Gsxztsr^&jLb 0 ,<sv (gjfild&uuQLD. 

aM<5/7<S31SU<* <95(533/?Ly/_/<S/I)<95f7<SOT hypothesis 
2_06Uf7<95<95£jz_/(/))i£>. h(x) = 1 CTSOTL/^/ 

aar<sfT<s3isu<95 (^rSld^LD. sss^rr ^sbsons, 
^SSXSST^^ILD 0 -cf/jSU (Sjjfild&Ll U(J)) LD. 





^GUGuntorra rSlptmaa^d^ 

hypothesis 3-(i)GurTd3iuu®LD. 


O 0 


o o 
o 


o 0 
0 0 0 
0 00 


• • 



• • 
• • 


o o 
0 0 0 
0 00 


o 

0 0 



O O 

o o 
o 


© o 
©00 
o @© 


o 

°0 



o O 


o o 
o 


o o 
0 0 0 
O 00 


o 

°0 



i3sirmp, l/£ 5 )< 5 / 7 <s s £0 Gues)GmuLD guq^SIjdQ^g^Igv 
ffianjurrai aGvsfldauuQGijpiDarTGtsT &rT£$ujLb 

30%, 2M£srrGijrra aGvsfldauu($)Gij£jD&rrGxr 
GrT^G^lULD 40%, udstn&ujrTai 

aGVsfldauu®GiJ£ir)arTGtsT 60% 

LD(djj&GfTrr&> aGtsrf]d&uu($\Gu&,!D3irTm &rT£$ujLb 
50% GTGtfT GU(fjjdljr)&>(]>&}GtiflGV Qj£ , GT^GST ^[T^^llULD 
tSutdiaLDrra ^(r^dSli d( 3<5/7, ^//jssu i3/flaS]eir dLp 

<sh&s)ldllild. £§)g/G>6u multi-class classification 

^,0LD. 











Decision tree, gaussian NB, KNN, SVC 
^Shus^isu ^j^iQumssrp multi class -agj 
gjGtnsmLirfhLiLi) algorithmns ^(§lL. spqij LDsvrj 

LDSVsShurr, QrrrrgDrrsun, ^mnemuiurr sissr/p/ 

§)[jLDn&sfluu&ijD&n&sT multi-class classification 
i3m<su(njLAngy. ^)<5dsu usvCSsugu 
algorithms qpgvld rflaLfi^uuQSlsirjDstr. 

gv)gij as/rl<si> SjawrTstsr score ldidqjld 

precision&recall Glamsmising rsfui (S^/fsiy 

Q&ujujGortLb.. 

https ://gist. github .com/nithyadurai8 7/ 

aaded978eb7e545006ed6117c97b86b3 


from sklearn.metrics import 







from sklearn.metrics import 

precision_recall_fscore_support 

import pandas as pd 

from sklearn.model_selection 

import train_test_split 

from sklearn.tree import 

DecisionTreeClassifier 

from sklearn.svm import SVC 

from sklearn.neighbors import 

KNeighborsClassifier 

from sklearn.naive_bayes import 

GaussianNB 

df = pd.read_csv('./flowers.csv') 
X = df[list(df.columns)[:-1]] 
y = df[ 1 Flower'] 

X_train, X_test, y_train, y_test 
= train_test_split(X, y, 
random_state = 0) 




tree = 

DecisionTreeClassifier(max_depth 
= 2).fit(X_train, y_train) 
tree_predictions = 
tree.predict(X_test) 
print (tree.score(X_test, 
y_test)) 

print (confusion_matrix(y_test, 

tree_predictions)) 

print 

(precision_recall_fscore_support( 
y_test, tree_predictions)) 

svc = SVC(kernel = 'linear', C = 

1).fit(X_train, y_train) 

svc_predictions = 

svc.predict(X_test) 

print (svc.score(X_test, y_test)) 

print (confusion_matrix(y_test, 



print 

(precision_recall_fscore_support( 
y_test, svc_predictions)) 

knn = 

KNeighborsClassifier(n_neighbors 

= 7).fit(X^train, y_train) 

knn_predictions = 

knn.predict(X_test) 

print (knn.score(X_test, y_test)) 

print (confusion_matrix(y_test, 

knn_predictions)) 

print 

(precision_recall_fscore_support( 
y_test, knn_predictions)) 

gnb = GaussianNB().fit(X_train, 
y_train) 

gnb_predictions = 

gnb.predict(X_test) 

print (gnb.score(X_test, y_test)) 




print (confusion_matrix(y_test, 

gnb_predictions)) 

print 

(precision_recall_fscore_support( 
y_test, gnb_predictions)) 

Qsusifluf®: 

0.8947368421052632 
[[15 1 0] 

[3 60] 

[ 0 0 13]] 

(array([0.83333333, 0.85714286, 1. ]), 
array([0.9375, 0.66666667, 1. ]), 
array([0.88235294, 0.75, 1. ]), array([16, 9, 
13], dtype=int64)) 

0.9736842105263158 
[[15 10] 

[09 0] 





[ 0 0 13]] 

(array([l., 0.9, 1. ]), array([0.9375, 1., 1. ]), 
array([0.96774194, 0.94736842, 1. ]), 
array([16, 9, 13], dtype=int64)) 

0.9736842105263158 
[[15 1 0] 

[090] 

[ 0 0 13]] 

(array([l., 0.9, 1. ]), array([0.9375, 1., 1. ]), 
array([0.96774194, 0.94736842, 1. ]), 
array([16, 9, 13], dtype=int64)) 

1.0 

[[16 0 0] 

[09 0] 

[ 0 0 13]] 

(array([l„ 1., 1.]), array([l., 1., 1.]), array([l., 
1., 1.]), array([16, 9, 13], dtype=int64)) 



<s> /(5)<s<s<5/7<g5 sum^ds^anurrsnij Liarrrflsv ssbsrr 

Gunij£s$)&>3iG$)6iT& Qamsrn®, ^rB^u L/a/T/j <oT[5g> 

Gusv&uSlm SLp ^svldiuld sjm a<5tfsfld(ff)Lb 

MultinomialNB algorithm iShsiiGur^LDrigu. 
https://gist.github.com/ 

mthvadurai87/3ce9dab55025felfd41b4da48d3 

fcbd8 


import pandas as pd 
from io import StringlO 
import matplotlib.pyplot as pit 
from 

sklearn.feature_extraction.text 
import TfidfVect o rizer 






from sklearn.feature_selection 

import chi2 

import numpy as np 

from sklearn.model_selection 

import train_test_split 

from 

sklearn.feature_extraction.text 

import CountVectorizer 

from 

sklearn.feature_extraction.text 
import TfidfTransformer 
from sklearn.naive_bayes import 
MultinomialNB 

df = 

pd.read_csv('./Consumer_Complaint 
s.csv', sep=',', 
error_bad_lines=False, 
index_col=False, dtype='Unicode 1 ) 
df = df[pd.notnull(df['Issue'])] 




fig = pit.figure(figsize=(8,6)) 
df.groupby('Product').Issue.count 
().plot.bar(ylim=0) 
pit.show() 

X_train, X_test, y_train, y_test 
= train_test_split(df['Issue'], 
df['Product'], random_state = 0) 
c = CountVectorizer() 
elf = MultinomialNB().fit 
(TfidfTransformer().fit_transform 
(c.fit_transform(X_train)), 
y_train) 

print(elf.predict(c.transform(["T 
his company refuses to provide me 
verification and validation of 
debt per my right under the 
FDCPA. I do not believe this debt 
is mine."]))) 




tfidf = 

TfidfVectorizer(sublinear_tf=True 
, min_df=5, norm='12', 
encoding='latin-l', 
ngram_range=(l, 2), 
stop_words= 1 english') 
features = 

tfidf.fit_transform(df.Issue).toa 
rray() 

print (features) 
df['category_id'] = 
df['Product'].factorize()[0] 
pro_cat = df[[ 1 Product 1 , 

'category_id']].drop_duplicates() 

.sort_values('category_id') 
print (pro_cat) 
for i, j in 

sorted(diet(pro_cat.values).items 



indices = 

np.argsort(chi2(features, 
df.category_id == j)[0]) 
print (indices) 
feature_names = 

np.array(tfidf.get_feature_names( 
))[indices] 

unigrams = [i for i in 
feature_names if len(i.split(' 
'))==!] 

bigrams = [i for i in 
feature_names if len(i.split(' 

')) == 2 ] 

print(i) 

print("unigrams.join(unigra 
ms[:5])) 

print("bigrams.join(bigrams 

[=5])) 




^)<5/D0 (Lp^&Slsv spshQsurrq^ product -sir d(Lgtb 
iST^ssxssr LiarrrjaGrr uuSlpdld^d 
QaurQdaiuuL^Qsbsrrm stgst spqjj suiksi / tl/lld 

( ’LpsvLD suemu$>§1 urrrjdauuQ&l/Dgl. 



i3mmp tSUsnsu 70-30 stotj ild uif 

uuSl/od) Q&n($\dauuL-($\ Q^rr^idanjuQiSlp^i. 

TfidfVectorizer plp&jll Li&rrrflsv ssrrsn 
g><s$i)£&>Gtstl surrij^ss)ff,ai<sb ^ss)&sT^§]Lb features - 










^3i G>&LfildaUU($)§>)GiT!DStr. LS<S5i r <537‘/7Chi2 ^LfiSVld 

epsijQ<suiT(§ ^ssfl^sst) category -<3 uj/t(5)ld 

£)<5f7i_in_/ QansssrQ\sbsn sunrf^sn^astflsir 

ulLi^iusv Gt&LfildauuQSl/D&ii. i3mmij ^ssisu 

<5<S5fl<5<SS5fl Siy,Tf7<5S?D<$(Ui73) ^SH&S)LLiIT,9,n&J <ST[f>&) 

category -sot Srp uhl/lL, ^rjsstsri^rjsmi—rrai 
<sn<5$)LD[5g>rTsb <oTf5&> category -®sr Stp ^s^ildilild 
sTssrug] unigrams, bigrams tsismud Quiuflsv 
Q&LSldauuQdlGiriDGisr.. 


17.1 Vectors 

classification problem <srdru§i '^%ld ^gvgv§] 

'^GVSVVSV' 6TGV)ILD LD$Ul3m l&Lfi <9563af? Ul3&5)m 
rflaLfi^giLi) srm sj/jjasarCSsu asmCSt-mb. ^)®nsu 
(ifiss)p)Quj 1 ^svsvgi 0-^si) (gjfildauuQLD. rsmb 
dUsv&LDiuLD <sunrddlujrhi£iisn<srT(SujfT, 

rffLpjDui—disnsnsnCSujn, GpGfihuiEiaGtsxsrrGiujn 
s-sbsffi—n&d Qarr®uuSl/odl ^s/rlda 


(Ssu<SWtiJ.u5)0<®0LD. £D^3/_/f7<S377D ^)(_/B7<95S)flsU 
^jSDIDSmpQuJSVSVITll l'S & O'S -=g,<® 

ld rr id gusu^irj sklearn suLpdi^dlsirp usvQsugu 
GUGtn&iurrm Qsudi—ijaim uiijrfihLiLb ^suroiSlm 
uujmunQasb u/DtfihLiil i 3 ssrsuq^LDri!Q] 
airsmsvml. 

usvQsugu GimdShuiEiastsvsfru QujDjl^(§d(^il spqj) 
Q &>iDguLi corpus toTmuu®£)/D3)i. corpus- 
si) s-sbsn ^gi/saisOTjSsaijSipLD 0's & l's ^,<95 
LD/r/rj/ri/sujs/D^ dictvectorizer(), 
countvectorizer() ^Shusmsu uiusi/uC^di gst idsst . 

Stp<95<956OTn_ 2_ <5/77763570 £5)61) COrpUSl LD/DIQJll 

corpus2 fa ismnl ggjrjsm® corpus 
QairQdauuLLQmmGxr. (ip^sSlsb s-tsbsrr&j 
dictvectorizer() -<*0 a_.5 nrjsmimiras^il, 
^)/TSOTrL/7su<s/7<® ssnmgj countvectorizer() -<*0 

2_<95/7/763377Df7<956iyLD ^63170/50/6776770/. ^Q^^ITa 



vector CTsp ld variable-si), corpus2-si> ssberr 
<suird£huiEiat§ri)da mm encode Q&ilhuuulLl- 

Qsudt—fjam ^smjjffis^srrerrm. ^supm/D msuggj 
rsmb ^rjGstsrQ\ Qsijdi—rjatQtijdSlmi—Giujujnm 

euclidean distance-®p OTsi/suavry <3nsmQ)LSlLSj.uu^i 

GreitjQ] unrfdasvrrLD. 

https: //gist, github. com/nithvadurai8 7/ 

f3fff58ab7272279ef069689fc391dec 


from sklearn.feature_extraction 

import DictVectorizer 

from 

sklearn.feature_extraction.text 

import CountVectorizer 

from 

sklearn.feature_extraction.text 

im po rt Tfi d fVect o ri z er 





from 

sklearn.feature_extraction.text 
import HashingVectorizer 
from sklearn.metrics.pairwise 
import euclidean_distances 


corpusl = [{'Gender': 'Male'}, 
{'Gender': 'Female'},{'Gender': 
'Transgender'},{'Gender' : 

'Male'},{'Gender': 'Female'}] 
corpus2 = ['Bird is a Peacock 
Bird','Peacock dances very 
well','It eats variety of 
seeds','Cumin seed was eaten by 
it once'] 

vectors = [[2, 0, 0, 0, 0, 0, 1, 

0 , 0 , 0 , 1 , 0 , 0 , 0 , 0 , 0 , 0 ], [ 0 , 

0 , 0 , 1 , 0 , 0 , 0 , 0 , 0 , 0 , 1 , 0 , 



[0, 0 

, 0, 0, 0, 1, 0, 1, 

1, 0, 

0, 0, 

1, 1, 0, 0, 0],[0, 1 

, 1, 0, 

1, 0, 

0, 1, 0, 1, 0, 1, 0, 

0, 0, 

l, 0]] 



# one- 

hot encoding 


vl = DictVectorizer() 


print 



(vl.fit_transform(corpusl) 

. toarra 

y()) 



print 

(vl.vocabulary_) 


# bag- 

of-words (term frequ 

encies. 

binary 

frequencies) 


v2 = CountVectorizer() 


print 



(v2.fit_transform(corpus2) 

. todens 

e()) 



print 

(v2.vocabulary_) 





print 

(TfidfVectorizer().fit_transform( 
corpus2).todense()) 

print 

(HashingVectorizer(n_features=6). 
transform(corpus2).todense()) 

print 

(euclidean_distances([vectors[0]] 

, [vectors[1]])) 

print 

(euclidean_distances([vectors[0]] 

,[vectors[2]])) 

print 

(euclidean_distances([vectors[0]] 
,[vectors[3]])) 




1 . dictvectorizer() -90 categorical variable-g> 
l's & 0's -=g,<s LDnjojD 'Gender' 

stswid categorical variable-sir LD^iuurra 'Male', 
'Female', ‘Transgender' ^Siujgv su 
'SystnLDrsgitsrrmsxr. (Lp^&Slsv <^£&,g$)3,uj unique 
tD^uL/asaisyr sDsujS^/ <spqj) dictionary-go 
a_ 0 snfr<* 0 LD. i3mmij £D/ 5 <s 3 &><s$i)£&>sts{) 
6 Uf 7 / 7 < 5 <sT><s<s(S) 3 LD, ^sujds^ijdu Qujogu sSlmrhl(^LD 
5 GuflatgnjLb 5*3 dimension Qarnsmi— spqj) 
matrix-^,* a_ 0 suridtf,uu (dud. ^^rrsugi 
spsuQsurr(§ (oUfjliLiLD ^&£> matrix-sir sp(§ row 
^dssiy ld, ^jd,s, su/fludlsv dictionary-si) ssbsn 
surr/rssaijs ^i—LbQuri)rSl(r^L]i3m 1 srsarsiyLD, 

£|) GVstsvsvQiussflisv 0 srsarsiy ld (Sum! (J) 
stnsijp&tldQamsrrtgnjLb. ^susuitCSjo spqij Qsudi—if 
a_0snfr<*<g5tjLJ(5)ffl/D^/- £§)^/(3sn one-hot 
encoding srsmjL/®®®#!/- 



prin 

t 


(vl. 

fit. 

_transform(corpusl).toarra 

y()) 



[[©■ 

1. 

0.] 

[i- 

0. 

0.] 

[ 0 . 

0. 

1-] 

[ 0 . 

1. 

0.] 

[i- 

0. 

0.]] 


fit_transform() srmu§i [bld^i corpus-go 
s-sbsrfi—rrai srQ^&rfdQairTGsisr® Qsudi—(§d(^d 
apgi]dQarT®d(§Lb. to_dense() ctsstl/^/ 
Gun!j£Gis)&,3i<sfi}m <sm—ij£dZid3inm Qsudi—G&rj 
2_06Uf7<*(3jLD. _vocabulary srsirugji rsLDgj 
Qsudi—ij 3_(rtj«urrd&,d,§?if6(v ) a_<ssi 9liu dictionary- 
go* (o)*/r< 5 WtiJ. 0 <S 0 LD. 


_nri nt~ _ ( \/1_ x/nrahi 1 1 a r\/_}_ 


*N 

1 

\ 

3 

2 

s 

) 

) 

\ 

2- 





{'Gender=Male 1 : 1 , 

'Gender=Female': 0, 

'Gender=Transgender': 2} 


2. countvectorizer() - QarrQdauuLlL- 
Gund&lujrhi&m<snG5)m£GS)&,iLiLb l's & 0's 
LDrrjoiQ] ld. [BLDgi s-^nijsm^^l sir 4 sniflan^rb, 17 

'/> 3 1 F '/> ft I S17 (517 IT[J&)615)£f)3> (6)7) LD QSiTSfTSfit. 67 (567 (2 (517 

4*17 dimension Q&rrsmi- matrix 

a_0si7f7<*<g5L7LJLl®syrsyr^/. sir spshQsurrq^ 
suifluSleoirb sr^Qa,^a, Gurrrj^ss)#, 

^]i—LbQujDgusrrsrr(S^rr^^i 1 srstsrsyLb, 
^i—LbQujDrrp surrij^ss)#, 0 srsms^rb 
^s^iLDrB^(§uu&D^d arrsmsvrrLD. ^gjCSsii bag of 
WOrdS STGZTUU®&>)fD3)l. 

surTrjpstnpam ^stnLDrs^QTjd^Lb <sn(S&> 
suifls$)&uSlsb&>rrm encode QaiuiuuuLLis^isu 
$]i—Lb QurojSlq^d^rb srssrd a^ro (ipis\.ujrraj. 




SUf7if<5<SD<56l5Syf)sU 2_6V7OT CTSUSU/7 

<oT(LQ&)&)]&>&>65)(oTTlLjl-b, dljfihu STQ-g^atfdai surra 

ld up rfl still® cgi/jSSTxsartokens-^a LDuptrpib. 

Tokenization srsmua,] ^ij6m(®d(§Lb (Smpuli _ 
STQgpgidastnsrru Quprti(§d(^LD suurjpstnpastnsrr 

^stni—Qsusifl s<s)sup§iu Ltiflpgi tokens-^* 
LDrrfonjjsuGs, ^^1. Tokens srssTugj 
(Haulitispjsb ^ji—l Quppjsrrsrr suurjpsnpasrr 

^@ld- 

Bird is a Peacock Bird','Peacock dances very 
well','It eats variety of seeds','Cumin seed was 
eaten by it once' 


print 

(v2.fit_transform(corpus2).todens 

e()) 

[[2000001000100000 

e A ■ « - - 



[0 001000000100010 
1 ] 

[0 000010110001100 

0 ] 

[0 110100101010001 

0 ]] 

£§)<5<s3i m binary frequency ldid^jld term 
frequency CT<s3rgn/LD <si5)<5/5j<s<syf)si) 

(&)rf)iui3i—soiTiD. binary Grmugi Geug^jib l's & 
O's -g> ldl.(J)ld Qgug/tIuu®id. term GTsirugJ 
spsuGsunqi) Gurr^G^i^iLiib CT< 5 < 5 < 531<537 (tp®n/D 
£g)z_ to Quid gush rngj ct<s37/_/<s3i<5 QGUGrfluuQpgjLD. 

£g)/5/(3j Bird CT<537L/^/ (LpS,sb GUudShU^^lGO 
£g)(7(5337(5) (tp<S31/D a_S77<5y7<5/7SU <SHf5g> ^)(_<5^)<SU 2 
CT<537 (o)<51/<Syi) UU(])}&)&)LLUlL(^\ <577 GTT <537 #> <5 <5/7<5337SiyLD. 


^)<5/b<®/7<537 vocabulary-su suifluSleSlq^^^iLb 

CT®<5<5L7LJl1z_ 17 <5S3f)<5^/SU <SU/7/7<5<S31<5<S677 





cgy&siLD/sIs^/jLJ<S31<5<S arrsmsyu) (0 (tp< 5 <si> 16 

su&np). £|)/5L(3j Bird, Peacock, it ^Suj 

6Uf777<5<S3><5<S6yr ^)/T<S33L(J) (LfiSD/D 

£g)i_ ldQujd gushsrrgi. ^mrrsb spGrj ep(§ (Lpssrp 
ldlLQ^ld &>rrm (S&iSldauuLlQmmgj. 

cgi/si/suLrCS/D case-sensitive ^jsvsvrrLDsv it. It 

^QlU ^jUSm(^)Lb 6£<S377D/735 

STt^^^idQarTshsnuuLh^shsn^i. (HldsvilL a 

STGSIU3)1 <spqj) &>StS?\ SUnfj£G$)&,UJrT3i 
st ® <s^/ <* (o) <95 rrshsrru u l_ <si5) si) sr> so . 


print (v2. vocabulary,,) 

{'bird': 0, 'is': 6, 'peacock': 
10, 'dances': 3, 'very': 14, 
'well': 16, 'it 1 : 7 
, 'eats': 5, 'variety': 13, 'of': 
8, 'seeds': 12, 'cumin': 2, 

'seed': 11, 'was': 



15, 'eaten': 4, 'by': 1, 'once': 

9} 


3. TfidfVectorizer() - Term frequency pl/dsd/x 
a_ 0 sufr<®< 95 uuQld (o) 6 u<si_<sy>/r normalize Q&uj&tj 
frequency -damssr weight- 
Qsuts/rlLjuQpgjLb. (o)su/ 7 i/LD raw count-^a 2 otsot 
Qsu<sifluu(^)^rTLD<sv, <sn&,G$)m normalize Qtf iu^ 
Glsus/rluuQpgjGuGSp L2 Normalization (level2) 

GTmUUQLD. 


print 

(TfidfVectorizer().fit_transform( 
corpus2).todense()) 

[[0.84352956 0. 0. 

n — n -- 








4. HashingVectorizer() - ^i&fjnSduSlm gjssxsm 
)<sv<svn'LD(S<sv(Suj (S^[ju\.ujfrai Qsndisnrj 

a_0snf7<*0LD.. (HLD/D&smi— diet & count ^dhu 
£g) fjsm(^Li) 2 uL£).a<s/rl<sv GSgugvksv Q^ilulild. 
(Lp^&SlsV Qsu&i—lj a_0SUf7<95<g5<5^)/D0^ 

Q&>Gts)Guiurrm dictionary -stxu s-q^surid^tb. 

<SH($\£&,ULq-UjrT3i£& > nGir (o)si/<*i_<s3i/7 a_0s u/t<®0ld. 
£!)££)si) (tp<ssu uLqsmup <ssi5)/7^^/ Q^rju^iurrai 

(o)su<*i_<sT>/7a_0si/f7<g50su<s3i<5<5<5/7®sr Hashing 

Trick srsirQunrLb. sjQ&srssflsv dictionary- <sar 
cgi/svrsiy (o)l/0<95u (o)l/0<s ^fB^srrs^d^u Quifliu 
.syarm&il&suLi G&LSlda^ Q&,GS)Guiunm memory-sw 
^snsi^LD ^$dlafld(3)LD. ^)<s3i<s<5 psfilrfuupjDarra 
6n/5<s(2<5 <^sijsnss)3>ujrTm Qsndi—fj^^LD. 





print 

(HashingVectorizer(n_features=6). 
transform(corpus2).todense()) 

[[ 0 . - 0.70710678 - 

0.70710678 0 . 0 . 

0 . ] 

[ 0 . 0 . 

0.81649658 - 0.40824829 
0.40824829 0 . ] 

[ 0.75592895 0 . 

0.37796447 0 . 

0.37796447 - 0 . 37796447 ] 

[ 0.25819889 0.77459667 0 . 

- 0.51639778 0 . 

0 . 25819889 ]] 

5. euclidean_distances - encode Q#ujujuui1l -_ 
rrsm @ surrddhuihiai^ddlsini—CSujujrTm 
(HeugljurT® si it, s, ^gi/svrsiydsgj ssnsn^j sissiuss)tr,d 





asmddli— 2-geyLi). (Hwiijasmi— s-grrrrsmgfdlsi) 
(Lppsv ^psm® GurrdQujrhiatgnjd(§ 

^)<SD/_(SuJUJ/T<537 G>GUgl]Un® ■f/Dg^J (^SS)g)SUrT3ySljLb, 

(Lp£<syd(§Lb 3-s ugj Gunddhupgid^Limm 
(HsugULm® &fDrru ^^aLD/rasiyrD, (LD&GVidf&LD 4- 

sugj Gurrd£hu@&j/d@LDn<5isT (Ssijguun® G^sztsvild 
&/Dg}j rz %yd)}&>LDrr&>(oi] ld ^)0l}(_/63i<s<® airrsmscmb. 


print 

(euclidean_distances([vectors[0]] 
,[vectors[1]])) 

[[ 2 . 82842712 ]] 

print 

(euclidean_distances([vectors[0]] 
, [vectors[2]])) 

[[ 3 . 31662479 ]] 






17.2 Natural Language 
Toolkit 

£ 1 )^ 617631/7 rsmi asmi— Qsudi—ij 2 -(nj surra aid 

^97631637a@6U;7D Sj(Sa 631 ILL) Spfl f76337(/j) 617/7(75631 <5 <95 6)7 
ldl1Q)(Sld &!^ldQ urnprlffndarTsvi ld &n_i_, £g)i _ld 

QujDIT&> 617/7(7<5631<5<95(6)3<95<95/763r0'S g> 

Qarrsdsru\.(r^d(a ) Lb. ^a,mrrsv ^rsa, 

(a)6Ud5(_063ii_uj <3)srrsi] ^tsIafldldljDgl. 

^ajQurrsirp <sy§ila ^srrs^sorrm O's -giu QupgiJ 
sfil sirring ld Qsudi^ija^nm sparse vector srmrry 
^GtnLfid&uuQdljDgl. a-^rrijsm^gidg spg 
Qarrui3st£]sh <smjd)ujsv, dlsrsfluur, siSlsmsmurrLL® 
(Suitsstid usvGtsurgj ^ismjDai^darrsrsr 

surrddhuriiasrr ssrrsrrQ^srsflsv, 
^surDsmpQiusvsvrrid spqjj Qsudi—rjrra lditjo^j id 
(Surra,] ^irrdhusvidarrsrsr surfluSlsv dlsrsflLDrrsijdarrsrsr 
surrij^sno, ^uidQurdrSlgdarr^], ^(S^Qumsv 
dlsrsflLDnsydarrsrrr surfluSlsv siSlsmsmumdt^darrsrsr 


sufr/765<53165 £§) 1—LdQ urDrflqjjdafT&j. ^(HpQumsv 
unij£&>rTsb spsijQGunqjj suifluSlsvitb 

Q&>Gts)GmiSlsbeon&> u suO’s rffsnjD[B^(§d(^Li). 

<^&,mrreb 2 (LpdShuu iBlrjd&Gtnmansb 
ST(Lg&}mrDm. 

(LpgGvrrsugrT& ^£ 5)65 ^gi/si/siy memory & space 
stfsmrrdljDaii. Numpy siss/ua^ 0’s 

cgi/SUSU/TcSSU/DSD/D LDL®LD 0/)5 )l}l5)(/))<S1Y ; 5 i /D<S/7<S 
GpQJj&UsV dllDULI <51/(53)65 <5/7<5iy <517(5316565<531 (SIT 

GULf>rh](&)&}mrDm. ^Q^&.&.rra, dimensionality -<531 
^sinsiy tSufiZianflda, ^^lanfldai ^snsijd(^u 

uuSljodl <s>](s/rlda£ (3<s <531 si/ turrm < 5 / 7517 65syfisw 

CT<533r<S33i?65<S3165U7LD ^laifldSl/D^I . 

^<5V6tn<svQiu<5sfJ<sv overfit ^su^roanm ^unriud) 
ssrrmgj. g^giGsu 'curse of dimensionality' 
^svsv^i 'Hughes effect' 

CT<S377D<53irp6565/j/_/®ffl/D^/. £§)<S31<5 OTfii/<517/77/}/ 



(gsn/Duugj srssru QuarsuQfl, dimensionality 
reduction ^%(&,ld. 

[BLDgi Qsudi—ij 2-(nj<surT&3i£$£im(3una)i 
stop_words='english' srmd 

QarrrQ\^Qg,ru£>rrmrrsb is,was,are (SurrsmjD 
^ihi&)<5v£5?l<sv su0®<5OT/d {fjssxsmd Q&n[f)3iGS)<srr 
STsbeomb psfilrjpgj LS^igpsrrsn Q&[T[D3it§n > d(& ) 

ldlLQ\ld dictionary s^q^su rrdaiuu<^\Lb. <g^&>mrrsb 
<^<5 sir dimensionality 0 ®si/d®/d^/- 

^sirsurrCSjo NLTK srstnnL «0si5) spsirgy ssrrmgj. 

<$l$s£ieben stemmer, lemmatizer 
^dliusujbs^jDLj u uj siru<^\^§]sug,sir ^Lp sold 

Qisudt _ iflsir dimensionality (s^sirsniLb 

(^ss)g)dayUuQ\siJss)g,d arrsmevruL. 



https://gist.github.com/ 

nithvadurai87/491e5e6f9c009ebd88912e7l8f9 

363a4 


II ll ll 

import nltk 
nltk.download() 

II II II 

from 

sklearn.feature_extraction.text 
import CountVectorizer 
from nltk import word_tokenize 
from nltk.stem import 
PorterStemmer 

from nltk.stem.wordnet import 
WordIMetLemmatizer 
_ fro m n l-tk impo rt pos_tag - 










def lemmatize(token, tag): 

if tag[0].lower() in ['n', 

'v' ] : 

return 

WordNetLemmatizer().lemmatize(tok 
en, tag[0].lower()) 
return token 

corpus = ['Bird is a Peacock 
Bird 1 Peacock dances very 
well','It eats variety of 
seedsCumin seed was eaten by 
it once'] 

print 

(CountVectorizer().fit_transform( 

corpus).todense()) 

print 

(CountVectorizer(stop_words='engl 




ish').fit_transform(corpus).toden 
se()) 

print 

(PorterStemmer().stem('seeds' )) 
print 

(WordNetLemmatizer().lemmatize( ' g 

athering', 'v')) 

print 

(WordNetLemmatizer().lemmatize('g 
athering', ' n ' )) 

s_lines=[] 

for document in corpus: 
s_words=[] 
for token in 

word_tokenize(document): 




s_words.append(PorterStemmer().st 
em(token)) 

s_lines.append(s_words) 
print ('Stemmed:',s_lines) 

tagged_corpus=[] 

for document in corpus: 

tagged_corpus.append(pos_tag(word 
_tokenize(document))) 

l_lines=[] 

for document in tagged_corpus: 
l_words=[] 

for token, tag in document: 


l_words.append(lemmatize(token, 
tag)) 

l_lines.append(l_words) 
print ('Lemmatized:',l_lines) — 



i3msu(f^LDrriQ] Q&iiigj 

uiusiiu (^3)3)Soniij. 


import nltk 
nltk.download() 



NLTK Downloader 



all-corpora 

All the corpora 

n/a 

partial 

all-nltk 

All packages available on nltk_data gh-pages branch 

n/a 

partial 

book 

Everything used in the NLTK Book 

n/a 

partial 

popular 

Popular packages 

n/a 

partial 

tests 

Packages for running tests 

n/a 

partial 

third-party 

Third-party data packages 

n/a 

not insl 


Server Index: https : / /raw, githubusercontent ■ com/nltk/nltk_data/gh-pages /\ 
Download Directory: /home/shrini/nltk_data _| 


'Bird is a Peacock Bird','Peacock dances very 
well','It eats variety of seeds'. 


'Cumin seed was eaten by it once’ 
















1. (Sld/d<9s<sw/_ Gunrd&hurhiatgnjdarTm 
CountVectorizer() iSisirsurr^LDnirn spqj) 

Qeudi—stnrj 2-(jjj sund (§ld ( 4 * 17 ). 


print 

(CountVectorizer().fit_transform( 
corpus).todense()) 

[[2000001000100000 

0] 

[0 001000000100010 
1 ] 

[0 000010110001100 
0 ] 

[0 110100101010001 

0 ]] 


(SLDpasmi_ sunradhurdai^d^ 

stop_words='english' srmd QarrQd,^ Qsudi_rj 




a_ 0 su f 7 <® 0 ld <3 u , 

is, very, well, it, of, was, by, once 

surnjiFjisniFj^m i§dv>uu ij! sup,nsb dimensionality 

(^snjorB^l ^(fjjuustnpd arrsmevruL ( 4 * 9 ). 


print 

(CountVectorizer(stop_words= 1 engl 
ish').fit_transform(corpus).toden 

se()) 

[[200001000] 

[001001000] 

[000010011] 

[ 010100100 ]] 


2. stop_words-'english' uiumuCfid&lmrrsvi ld 
seeds, seed ^Qiu ^jpemt^Lb ^psm® 

GU!T[J&)(5&)&)3>GfTn<9i 

Q&LSldanjuQ&imiDm. <^6$)g,£ &,®S)piju& > [D3irT3i 




6u/ 6<5G><5 PorterStemmer() ^(&)Ld. spa^ 

^didlsvd Q&nsbsSlm GsufjQarrsbss)so 

asmi—i^lrB^i <s>]s$)&> ldlLQld (3&Lfi]d(§Lb. 
£>(LgG)S) SU0®(5377D £§)<SOT<S3T lSJjD 

Q&rriij&stnsrrQujsvsvmi) (SaLSldanr^i. 


print 

(PorterStemmer().stem('seeds')) 
seed 


3. WordNetLemmatizer() srsitu^i <$($ 

^didlsvd Q&rrsbemso Qua(§srrrr^rB^i 

iSlftld,gtf GauSld^uj. ^sna.rrsna,] spGrj spqjj Qarrsv 
sprfli—£b§i)sb QuujfjdQarTsbsorrasijLb LDroQprrq^ 
sfilstnstsrQ&rTsvsmasi ]ld 
uiusiru ^ uulL is). q^ulSI sir su 

^psmstni—iLjLb ^psm® a,&sfid > a,&sf'i Qanrdasnnad 




<3<ju5)<®0ld. a_d 5 rrijisssTif, ji/* @ 'I am gathering 
foods for birds', 'seeds are stored in the 
gathering place' (ndru§>i<zv gathering, gather 

STmU§] 6Uf777<5<S31<5<S6y777<S 

(3<5FL/5) <95 <95 U U (J) LD. 


print 

(WordNetLemmatizer().lemmatize('g 

athering', 'v' )) 

gather 

print 

(WordNetLemmatizer().lemmatize( 'g 

athering', ' n ' )) 

gathering 


4 . r5Lb(Lp6tni—iu corpus-®o NLTK Qarnsm® 

^i smi (^Lb(Suna,i. i3msu(f^LtrriQ] 

(o) SUSyf) (_}(_/(/)) <5 dJ/LD. 




print ('Stemmed:',s_ 

_lines) 

Stemmed: [['bird', 

is', 'a', 

'peacock', 'bird'], 

['peacock', 

'danc', 'veri', 'w 


ell'], ['It', 'eat', 

'varieti', 

'of', 'seed'], ['cumin', 'seed'. 

'wa', 'eaten'. 


'by', 'it', ' one']] 


print ('Lemmatized: 

,l_lines) 

Lemmatized: [['Bird' 

, 'be', 'a', 

'Peacock', 'Bird'], 

['Peacock', 

'dance', 'very', 'well'], 

['It', 'eat', 'variety', 'of'. 

'seed'], ['Cumin', 

seed', 'be'. 

'eat', 'by', 'it', 

once']] 




18. Decision Trees & 
Random Forest 


Regression ld/d^i/ld Classification 

^)/7®ssrt^./D0LD a_<ssu<s<Si.tij.uj (3/5/r(3<s/7(/j) 

( Lpss)jDuSl<sv LOlflda ^lusvrrp non-linear 
.s/rsiya^aa/rsarmodel-^a decision trees 
LDjDgijLD random forest sfihsmhi^&ijDg]. Decision 
trees sisiiu^i Qurrajisuna, ^rrsyas/rlisv 

s-sbsn LD^iULiaySSxsnd Qamsrn® ^gu/dspi/d 

dirudirru u^fdi&GrTrT&u i3$l£§]d ajoffi/Dgl- 
dLfidasmi— GrQ^&jdairTLLisi-Gb s^0 LDSvrf 

LDsbeShurr, QrrrTgDrrsun, ^fTLLGinijiurr gt sir fry 

S.ijuMssflds, DecisionTreeClassifier() ldid^jld 
RandomF orestClassif ier() 

uiUGiru($\£&,L]UL-($\6bGiTm. s^si/(o)sn/T0 ldsu/Dsw 



£D< 5 £p<s(S) 3 <sn/_uj(sepal) $<sn<sya<sv(LpLb, 
^surorfilm (Sldjdlijd £§)<srp<s(6)3<sT>f_uj(petal) fisrr 
^snsv(LpLDn'm 4 ^id^rdaCSm sp(§ LDSvrj tsrrb&> 

LDsorjrrai ^(fjjd(§L£> ct < sot /_/< sd < 5<5 ^[jLDrrssfldSljD^l. 

rJmL&rhias/rhsvimm prrGijaGtnm u<sv(Ssugu 
u(§$d)a 6 rrnau iBhfl^&jd &rr)( 3 )Lb G>Gii 6 tn< 5 V 6 mu 

DecisionTreeClassifier() QeiuSirogj. 

<s>l<sii sun fry ^rjsi^aissxsmj i 3 ffl u u§j <oTmu&,i sp(§dl sv 
conditions-gxj Qurriryd,^ [Bi—ddljo^i. 
sjmGsu&.rrm ^)<snsu Eager learners stsstiq] 
^GS)L£>da,uu($\&}mrDm. £D<s/d0 LDfTpjprTai KNN 
tordrugj lazy learners ^0 ld. Ensemble 
learning ctotj jld ^gdiduSIsv random forest 
<s/d®® 0/. Ensemble otw/d/tsu 0( l£>ldld OTssr/p/ 
Qurrq^m. usuGsu/p/ decision trees-g> 

a_0SUf7<95®, ^SUIDS^IJD (3)(L£)LDLC>rra <S3161/<50/<95 

< 95 /rj®/D 0 /. 0 (tp ld< 5 si) a_syfsir si/(fl)sn/T 0 tree -ld 
Q si/si/Gsu/py uuSl/odl^ .s/jsiyasaisyr 



Qarrsm® uuSJjb& Qufogud GlarTerrSliDgl. stsotSsu 
£D<ssp<si ni—uj accuracy £|)<sisrgp/LD 

^) 0 <® 0 ld . ^Lfidasmi— <STQ\^§]daimdi^sb 

^snsij&tQtijdarTGVT rflrjsvsvd airrsmscmb. Decision 
Trees 89% accuracy -g> il/ld , Random forest 
97% accuracy -gxzyLD Qsustfluut^^^isus^i^d 

arTGvsrGorTLb. (Sm^iLi <spshQsunmj^]Lb OTsi/sufr/p/ 
s,[jSLj{f,®s)6fiLj iShfidjgrfd tftfodifogtl <oimu&)i 
GUSS)[JUI—LD[T&GlJLb 35f7LZ_UULi(J)6)T6yrgi/. 

https: //gist, github. com/nithvadurai8 7/ 

d21ffb25b7f5a38d90a437e9f!69d58e 


from sklearn.datasets import 
load_iris - 







import pandas as pd 
import os 

from sklearn.tree import 
DecisionTreeClassifier,export_gra 
phviz 

from sklearn.metrics import 

confusion_matrix,accuracy_score,c 

lassification_report 

from io import StringlO 

import pydotplus 

from sklearn.model_selection 

import train_test_split 

from sklearn.ensemble import 

RandomForestClassifier 

from IPython.display import Image 

import matplotlib.pyplot as pit 

import seaborn as sns 

df = pd.read_csv(/flowers.csv') 
X = df[list(df.columns)[:-1]] 



X_train, X_test, y_train, y_test 
= train_test_split(X, y, 
random_state = 0) 

a = 

DecisionTreeClassifier(criterion 
= "entropy", random_state = 

100,max_depth=3, 
min_samples_leaf=5) # gini 
a.fit(X_train, y_train) 
y_pred = a.predict(X_test) 
print("Confusion Matrix: ", 
confusion_matrix(y_test, y_pred)) 
print ("Accuracy : ", 
accuracy_score(y_test,y_pred)*100 
) 

print("Report : ", 
classification_report(y_test, 
y_pred)) 

dot_data = StringIO() - 



export_graphviz(a, 
out_file=dot_data,filled=True, 
rounded=True,special_characters=T 
rue) 
graph = 

pydotplus.graph_from_dot_data(dot 
_data.getvalue()) 

Image(graph.create_png()) 
graph.write_png("decisiontree.png 
") 

b = 

RandomForestClassifier(max_depth 
= None, n_estimators=100) 
b.fit(X_train,y_train) 
y_pred = b.predict(X_test) 
print("Confusion Matrix: ", 
confusion_matrix(y_test, y_pred)) 
print ("Accuracy : ", 
accuracy_score(y_test,y_pred)*100 



print("Report : ", 
classification_report(y_test, 
y_pred)) 

export_graphviz(b.estimators_[5], 

out_file='tree.dot 1 , 
feature_names = 

X_train.columns.tolist(), 

class_names = 

['Lotus', 'Jasmin', 'Rose'], 

rounded = True, 
proportion = False, precision = 

2, filled = True) 

os.system ("dot -Tpng tree.dot -o 
randomforest.png -Gdpi=600") 
Image(filename = 

'randomforest.png') 
f = 

pd.Series(b.feature_importances_. 




index=X_train.columns.tolist()). s 
ort_values(ascending=False) 
sns.barplot(x=f, y=f.index) 
pit.xlabel( 1 Feature Importance 
Score' ) 

pit.ylabel( 1 Features ') 
pit.legend() 
pit.show() 


rfUirsvidarrm aSlardaii: 


flowers. CSV sr sot/ld (SanruLSlsv Qmrr^^LD 150 
<®/ 76 iy<® 6 vr uuSl/D&d^ ssbsnm. 
train_test_split() otsot/ld (Lps^rouuis^. 112 
<®/T6iy<®6vr uuSljd^d(^Lb, lEG^I 38 <s/ 7 siy<*syr uu51id§) 
Q&iliujuulLl _ model-gi G>&rT$£luu£iD(8)Lb 
uujmu($\£&,uuLL($\<5b<srrm. SdLpdasmi— decision 

tree- sot (tp<s<su node-.® gj sot sshsrr samples=112 
CTSOTL/^/ G)LDn^d&)Ll) UuSljD^d(^ 



^sifld&uuiJ-QGrrm giJGy&Gtnmd (gjfilddl/Dgii. 

value = [34,41,37] ^rssru^i 34 gjsij&sn 

LD SVsSl GS>ad(§LD 41 ^/7Siy<95Syr<5/7LD(531/7<®0t£), 37 
3)ijSL/iF,m Gf 7 fT®ofTsi/« 0 /x ^jssnjjrsg)! errmesr gtsvild 
®S]Gurj£G$)&>d QarrQddlrDgj. entropy = 1.581 
LDrr^uflas^eb s_mstT uncertainty / disorder / 

impurity-®p<® (grfilddl/Dgi . ^&>rTGu§i rsmb 
gus$)&lju($£&, (SsusmLSj.iu usvCSsugij iBhflGij&sfilsb 
s-sbsn &,rjGy&t§H)Lb ct/5<s ^gi/avrsiy si5)®<s<s^)sil) 

asofB&jsbsrrm <ojmu<5$)&>d ^)<s/rj<®/r<5W 

asmdd® l5)<swsu0ld (Lps^ipuSlsv rflaQgLD. 
(Lp^&Slsv Qld/t<s<s <sirsiy<95syDsu spsuQsunr(§ 
iShflsmsud Q&fjrsg, &>tjGij3i(§tr ) Lb CTsi/susyrsiy 
toTGmGvsfldGtnauShsv ssbsrrm stct/ld i3mmib 

a<5mddh—uu®LD. L3mmij < 3 iLbLD$diULid(& ) log 
base 2 3ySsstQ\i3u\.dai (Ssi issstQld. ^p/Damssr 
acrnsfil https://www.rmniwehtool.com/log-base- 
2-calculator/ loT^iLb sussxso^m^^isb ssbsn§i. 
^jsijsurrCSiD ld<sv<s 61, Grrngnn, affiDgriff simmub 




spsuQsuir(§ i3iflei]d(3)LD pssflppstsfhunad 
aisssrQ\i3i^dai G>siJsm®Lb. aismiSnurrai 
£g) <sn suasyflswdfo.L.fJl.s Ql <5 rrss) <95 sr> uj - aign ;ld 
CT^)/ 77D<s3i/D (grfihu rrsv Gluq^dSssmsb Ssmi—uuGp 

entropy 

Entropy = - {Summation of (fraction of each 
class.log base 2 of that fraction)} 

= -{ (34/112).log2(34/112) + 
(41/112).log2(41/112) + (37/112).log2(37/112) 
} 

= -{ (0.3035).Iog2(0.3035) + 

(0.3661).Iog2(0.3661) + (0.3303).log2(0.3303) 

} 

= -{ (0.3035).(-1.7202) + (0.3661).(-1.4496) + 
(0.3303).(-1.5981) } 

= -{ -0.5220 + -0.5307 + -0.5278 } 

= -{ -1.5805} 

= 1.581 



X 2 < 2.35 
entropy = 1.581 
samples =112 
value = [34, 41,37] 


True 




False 


entropy = 0.0 
samples = 37 
value = [0, 0, 37 ] / 


X 2 < 4.95 
entropy = 0.994 
samples = 75 
value = [34, 41, 0] 


X 3 < 1.55 
entropy = 0.414 
samples = 36 
value = [33, 3, 0] 




X 3 < 1.85 
entropy = 0.172 
samples = 39 
value = [1, 38, 0] 


( entropy = 0.0 N 


( entropy = 1.0 ^ 


( entropy = 0.503^ 


samples = 30 


samples = 6 


samples = 9 


Rvalue = [30, 0, 0] y 


^ value = [3, 3, 0] 


Rvalue = [1, 8, 0] , 



entropy = 0.< 
samples = 3 
value = [0, 30 


<*&sar<95® entropy ld^iuss)u(Suj 

GLKsnrruui—pSilffsr (Lp^sv node-ai) ansmsomb. 

^)ldld^)lvl/ O-«0 tSuGbiaLDrra, ^(f^uu^rrsv, 112 
























< 5 /rsiy<*(S) 5 LD s £0 condition qpgold 37, 75 stsot; ld 

toTsmsvsfldstnauShsv ^svudilild $](§ iBlifUsijasrm&yij 
LSlfldauuQdlsirfDGxr. ^^rTGugi X2 stsotl}lj ( 5 ) ld 
Petal_length <snLb&£dZim w^iuLi^siflsv 2.35 -<*0 
dLp ^)0^<5/7<si) ^<5<5®n«uj <5/jsiy <95 sot £!)/_/_} l//d 
node-g>/ ld, ^ i$£i3iLDrT&> ssnsnsnsu susvulijd node- 
svi ld i3fidaiiju($\$d}mrDm. i3mmij lBssst^ld 

LUlfldauuLLL- £g)0 l 5)/i)siy<95(25<*0 ld entropy 
asmddli—uuQdliBgi. ^]I—ulipld ssnsn node-si) 
entropy 0.0 stsot si 00 /sitsot 0/. £D0/<3su decision 
node toTsmjuQLb. spirtsugi O-^a £|)0<950 ld 

uLd&£$d)<sv <sy§>)<sv 2-6rrm <s/7siy<95syr ^Gtnm^&jLb 

ts?G&,rT (spqjj si /sot <95 u5)sir SLp l® fldauu lL($isiSI l_/_ 0 / 
srsOT/p/ ^ij£&>Lb. <sn&,m value ldG^ul/ld [0,0,37] 

srsOT/p/ ssbrngj. ^ssrrsu^i LD<sv<s6l6tnad(§Lb, 

<5(7LDS31/7<950LD(7SOT<S/jrSiy<95Syf)sOT STS3OTS33/?<95S31<95 0. 

(S/7/r^/rSiy<95<95fTSOT STS3OTS33f?<95S31<95 37. £|)0/(3su S £0 
ys 3 isn (S/ 7 / 7^/7 stsot (tp/jj_siy Q&ujsu&.tDainm 

decision node ^0 ld . £§)G<s (Lps^irouSlsv 



«li annUu svi man ldjbjd nodes 
3-(njGurT&a,uu($\&\GiTrDm. Lonjrp features-ii 

( H&rT$daLju®£)GSTjDGxr. snss)rjuui—^^im <s®n/_£) 

S}ssxsnuSlsb spqjj ysnsu ldsusiS) ^svev^i &>rtLDG<s)[j 

ctsot (tptij_siy QfiLiGijpiD&rTm decision nodes 
^snLD^gjsbsnm. ^^rrsu^i mlA dlstsvsnuShsv 
£j£ll—Lfi](fJ)r5&)J 61VSUU>/7<95 sshm 3 nodeS-0L>, egl/jSSOT 

value LDfiZiULiaiGsxsrr asussfldasyLb. ldgogSI sim 
(tpuj_siy Q&iuGU& > [f)3inm ^i—£$dl<sv 34 stsot 
G) LDn^&>LD[T&> ^GVSVrTLDGV, 30, 3, 1 CTS3T 
pssfl^Gtsfhunau iJUfl^^i £|)<5<5<531<95UJ decision 
nodes-®» s-(§surrddhLishm^i. ^siisurrGrr) 

GU<5VLfi](fTjr5fF)J ^l—LDI73i SSbsfT 3 nodeS-0L>, &>fTLD(oft)[J 

srm (tptij_siy Q&iLiGupjDarTGzr ^i—£ 5 $dl<sv 41 otsst 
Co) ld rrd)&)LDrr&> ^sbsom£>sb, 30, 8, 3 ct<537<5 

g><s$i)£&>s5?hLm3iu s-Q^su^dShL/mm^i. 

sTmQsij&.nm ^GVGij&sfilm entropy 0 ld/dtuld 
^<5/00 Qrs(§1*1 Shu LDtdluurra sshm^i. 



Information Gain: 


9® 0tf5)/jL5)/J_i_ i3fieiSleb ^rjGij&Gsxsn 

GUG$) 3 ilju($\£{F ) lGl]&>!D( 3 )£ G^GtnSLI UJ ITGtSr 

aS)Gurj[h] 3 iGis) 6 fT st1 6 <s srsns^d^ 90 feature 
^sifldSijD^i stsstuG^ Information Gain 

stsstuu^ld. ^Dg/siyrc entropy-gpu GunmGro 
&>TjGyaiGS) 6 rr pfliurra sustnauu®^ a_<ssiy ld sp(§ 

metric s&Tgi-b. entropy srmu§i impurity s&(&,ld. 

sdsujs^/, ■shit, s, impurity-go* 
0 <s3i/dl}l/^/D 0 a_,*si /ld metric dg/rsir gini gain 

STmuu®Lb. ^pirjamstsr sumuuurT® 
LSlsSTGlJQJjLDITV)]. 

Information Gain = Parent's entropy - child's 
entropy with weighted average 

child's entropy with weighted average = [(no. 
of examples in left child node) / (total no. of 



examples in parent node) * (entropy of left 
node)] + 

[(no. of examples in right child node)/ (total 
no. of examples in parent node) * (entropy of 
right node)] 

= (37/112)*0.0 + (75/112)*0.994 
= 0 + 0.665625 
= 0.665 

firrsfitzv DecisionTreeClassifier()- 
d gjsyf criterion = "entropy" OT&srujS/rjgj uSiUsvrra 
"gini"CT<s3r<® G)arr($}&,&,] uuSJ/d® ■sh^i 

gini-g><s asmddhJ-® i3msnq^LDrTiQ] dlstnmastnm 




Random Forest (Lpss>jDu5hsb arb^ib model-sw 

SUSS)rjU!—LD LS)<SOT61/0LDf77p/. 






















































































Random forest-®! urn$rfl£ <s/ 7 siy<*®flaL) ssbsn 

S^si/(o)sn/T0 feature-LD <SUG$)3ilju($\£g,Gg]&(& ) ST/5<5 
c #/s)7a/«0 urhiasufl^s^ishm^i sisiiu&ns, i3m<su(rjjLb 

GUG$)rjLiui—£$<sb a ism sc mb.. 







19. Clustering with K- 
Means 


Unsupervised learning-si) rEmi> apa ^q^d^Lb 
(Lp&,sb algorithm su. ^§isuss)[j rs/rib asmi— 

^GtnGVTggJLC) SUperviSed-SOT Stf) ^S^ILDILILD. 

logistic regression, multi-class classification 
(SurrsirjD ^isnmd^ieciLb. 2_siraf(/))(X) ld/d/ti/ld 
Qsusnfhd(^(Y) ^rjsssrss)i—iLiLb uuSl/b® 

^sufluCSunrib. usvQeugu QsusufluS'LiQ) 
susnaasrAsOT Srp <5/rsiy<95®nsyrij LSlfluu^io^ 
<sn£&>G$)stsi susosyiunm srsusDSuasDsyruyLD /5 /tCSld 

SUS51/7UJS31/D Q&ujQsurTLb. ^ssmsv $][5& 
unsupervised-si) Qsuituld ssnsuf^asn ldlIQGld 
Qarr®dauu®Lb. tsr£&,G$)m Gus$)&uSl<sb LSlflda 
(Ssusm(^)Lb <oimu(3g>rT, ^surorSlm ’srsbss)Soaisb 



srmm (orsiruQprT QarrQdauui—rTgj. 

(Surrsirp clustering-si) srsusaisuasirK-means 
QpGOLDrrai a,smdS}t—uuQ\S}mpm. srsi/susirsiy 
suss)<*<*syf)si) iSlifld&> (SsiismQLD 6 rmuG$)&> elbow 

method-^Lpsu ld aGmddUsvmd. spqij 

snss)rjiuss)ps<s)ujd QarTQdpgj ap&d Q&nsbsviGiia,i 
supervised toTGsrprrcsv, <oT611 GdS)p sutsmrjujsinpiLiLD 
^sbeorrLDsb apad Qsnsbsoisiuhi unsupervised 

^,0LD. 

dLpdasmi— 3-pnrrrsmp^lsv XI, X2 si spud 

^PGmQd LDtf id a> si/(fea tLi res) 

QarTQddaujuLLQdsbisnm. Y 6T63r/ry sr^/siyLD 

£g)si><ss)su. ^pmsiigi QsupiLD a-GbsrfLLQdd&rtm 
prjsi^anssxsnd Qamsm® ldlLQ\Qld rsrrLDrra( 3 su 
usvCSsugu (&)(Lgd&sfilsb ^supssip 

GUGsi&uuQdpddid QarrQdda (Ssusm(^)Ld>. 



xl = [15, 19, 15, 5, 13, 17, 15, 12, 8, 6, 9, 13] 
x2 = [13, 16, 17, 6, 17, 14, 15, 13, 7, 6, 10, 12] 


£D<5/D<sf7<s5T rffrrsi) ldpo^ld sfihsndaLb i3msuq^LDrriQ]. 
https://gist.github.com/ 

mthvadurai87/185e332ebce7028af265adbe86d 

b40d5 


import matplotlib.pyplot as pit 
import math 

def 

plots(clusterl_xl,clusterl_x2,clu 
ster2_xl,cluster2_x2): 

- pit.figure() - 






pit.plot(clusterl_xl,clusterl_x2, 

' ■ ') 

pit.plot(cluster2_xl,cluster2_x2, 
1 * 1 ^ 

pit.grid(True) 
pit.show() 

def 

roundl(cl_xl,cl_x2,c2_xl,c2_x2): 
clusterl_xl = [] 
clusterl_x2 = [] 
cluster2_xl = [] 
cluster2_x2 = [] 

for i,j in zip(xl,x2): 
a = math.sqrt(((i-cl_xl)**2 + 
(j-Cl_x2)**2)) 

b = math.sqrt(((i-c2_xl)**2 + 
(j-c2_x2)**2)) - 




if a < b: 

clusterl_xl.append(i) 
clusterl_x2.append(j ) 

else: 

cluster2_xl.append(i) 
cluster2_x2.append(j ) 


plots(clusterl_xl,clusterl_x2,clu 
ster2_xl,cluster2_x2) 

cl_xl = 

sum(clusterl_xl)/len(clusterl_xl) 
cl_x2 = 

sum(clusterl_x2)/len(clusterl_x2) 
c2_xl = 

sum(cluster2_xl)/len(cluster2_xl) 
c2_x2 = 

sum(cluster2_x2)/len(cluster2_x2) 




round2 

(cl_xl,cl_x2,c2_xl,c2_x2) 
def 

round2(cl_xl,cl_x2,c2_xl,c2_x2): 
clusterl_xl = [] 
clusterl_x2 = [] 
cluster2_xl = [] 
cluster2_x2 = [] 

for i,j in zip(xl,x2): 
c = math.sqrt(((i-cl_xl)**2 + 
(j-Cl_x2)**2)) 

d = math.sqrt(((i-c2_xl)**2 + 
(j-c2_x2)**2)) 
if c < d: 

clusterl_xl.append(i) 
clusterl_x2.append(j ) 

cluster2_xl.append(i) 


else: 



plots(clusterl_xl,clusterl_x2,clu 
ster2_xl,cluster2_x2) 


xl = [15, 

19, 

, 15, 

5, 13, 17, 15, 

12, 8, 6, 

9, 

13] 


x2 = [13, 

16, 

■ 17, 

6, 17, 14, 15, 

13, 7, 6, 

10, 

, 12] 


plots(xl, 

x2, | 

□ ,□) 


roundl(xl 

[4], 

, x2[4] 

,xl[10],x2[10] 


(tp<sa5)<su XI, X2 stotjvld usm® ^LDSFriiaa^LD 

loTsijsunrgU ^s^imrs^iehmsm srmuGS)&> scatter plot 

eyj&jLD arTGtssrsorTLD. G^szTstniLD ^rjsmi—rrsn§] 
Glarr^G^sv tormQmmm tSmb&fm&Gsxsn ^GVLD&ai 

(Ssusmt^LD 6TGVTU& t l aySSSU—f]SlUJUUIS)S]sbsS)SO. 
lormCSsu csysnsi/ airreSlu uLii^ujeorrai 

^GV)IUUUU(fi£)6STpm. 






plots(xl,x2,[],[]) 
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19.1 Centroids ($ 5 l<sissfl<sijdarr<ssr l/ erT&fl) 

clusters-^) 3-(r^surTd(^su^fD(^ (Lp&,eS\<sb 
Xl-s^)/T<S3S7'(5) OTSOTrasaiSITUyLD, X2-S 

^jijsmQ) <sismi a,&s) ®riLi ld random-^* Qa^ij si/ 
Q&iliuj (Ssusm(^)Lb. (Lppsv Qan£a)]&(3) XI- 





















«aS)0/5g/ 13 -goiL/LD, X2 -<s61(§[b^i 1 7-®oiLjLD Gts/roy 
Q&uj&jsbQstmLD. ^ si/ su rrGro ^rjessu—rrsug,] 
Q&n£a ) ld(& ) XI -s61(§[b^i 9-®oililc>, X2 -sSlq^^gj 
lO-gon/LD Q&iuatisbQmmb. ^)<sdsu(Suj 

LirrsftT Li<srr<s/rlam (centroids) 
toTmrDG<s)L£>&3,uu($\&}mrDGtsr. ^sujdstijo 

c gyt<j_/j/_/<sy>/_u_//T<s <sy>6i/<sG><5 ^s^ism^s^^iL/Lb rsmi 

^jusm @ (o)<95/7<5<5/7<95L/ LSlfldau QurrSlCSromi. 
srmQsn ^jusmt^i ^Lb&iEiaQiflisv ssbsn spsijQsuiT(§ 

<5/rsiy<95(S)5<®0LD, (S^dlB^^&dauuLLL- ^psm® 
lj Lj lj L/sirsyf)<95(S)5<®0LDfrs5r §H[Jld dLpdasmi— 

GumuuurT® ^Lpsctb asmddh—uu^dljD^i. 


^itijldI = (xl_data -13)**2 + (x2_data - 
§irrnl2 = (xl_data - 9)**2 + (x2_data - 10)**2 



xl 

x2 

gfliiun 

gjWW2 



(15-13)**2 + (13-17)**2 
= 4 + 16 
= 20 

(15-9)**2 + (13-10)**2 
= 36 + 9 
= 45 

15 

13 

Sqrt(20) = 4.47 

Sqrt(45) = 6.70 



(19-13)**2 + (16-17)**2 
= 36 + 1 
= 37 

(19-9)**2 + (16-10)**2 
= 100 + 36 
= 136 

19 

16 

Sqrt(37) = 6.08 

Sqrt(136) = 11.66 

15 

17 

2 

9.21 

5 

6 

13.6 

5.65 


17 

0 

8 


14 

5 

8.94 

15 

15 

2.82 

7.81 

12 

13 

4.12 

4.24 

- 

7 

11.18 

3.16 

6 

6 

13.03 

5 

9 

4 

8.06 

0 

13 

12 

5 

4.47 


£|)/ 5<5 ^psm® Qarrggjdasiiflsv (Lp^sv 
Qarrd&lGoiGoi—Uj gnrjLD (&>go rosurra, ^jQ^rB^rrsv 

u Lisrrs/rlasrr (Lppsv Qaaddlsvuii. 
^svsosvQiuGsflsv ^jUGmi—rrsu^i QaadSlsviLci 

^GOLDdSlsirrDGO. ^jGOSU (LpSOroGlLl LDfGTj&sh 

LDfDfQ]LD 2Sr<5/T rfjl/D&S>l6V (SLD/D&GtfSTl _ UI—£$dlSV 

airrLLi—uuLLQsbsrrff)]. ^susunjorra (Lp^sv 






















(o)<S/7<53!/<S<S/r<S37Xl, x2 LDJDgULD /jTSWLfrSU^/ 

Q&n^&jdairTmxl, x2 CTssr/py 4 ^Lb&rmansn 
asmd&U—uuQ&lsiriDstsr. ^snsu (Lps^npCSiu L/srrsrfl 
GULSI-G>5h5VILD. r5Ll&£$dl[J 6UUJ_6l5)|J>/LD SU<531 /T Ul—LDIK 
GUGtnrjr5&)i &rrLLi—uu($\&}mrDm. 

cluster l_xl = [15, 19, 15, 13, 17, 15, 12] 
cluster l_x2 = [13, 16, 17, 17, 14, 15, 13] 
cluster2_xl = [5, 8, 6, 9, 13] 
cluster2_x2 = [6, 7, 6, 10, 12] 

plots(clusterl_xl, cluster I_x2,cluster2_xl,clus 
ter2_x2) 
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£|)6i/6Uf77D/735 (Lp^&Slsv Qarrggjdam 

a_0 surra <* uu z_!_z_ i 3 mmp, c gysu/D/i5)siS)0/5^/ 
LEsmQLD SdsttsfluLiu L/shstflash 

Q&,ij[5Q&>(l)\d&iju($\&\sGrrDm. ^mrnsb ^LD(LpGtn/D 
^)<s3isu random-^* (S<s//siy 
Q&iuujuu®siJ$dl<svGin<sv. ^psm® 
QarTd&idaGiflGviLb ^nss)Lpd§isbsnxl. xl-d&nm 
mean aGmddh—uuLL® ^smsuGuj ^Is^sfluL/u 





















L-isbsfilaimna, <snGS)LD&}mrDm. srmQsu (g)<sgrgn/LD 
<j/7)/ri/ §isbsSUui£>rrm ^psm® Qarrpgidastsvsrr \r,rrih 

a_0SUf7<®<95 (LpLSj-lLjLD. 


cl_xl = (15 + 19 + 15 + 13 + 17 + 15 + 12) / 7 
= 106/7 
= 15.14 

cl_x2 = [13 + 16 + 17 + 17 + 14 + 15 + 13] / 7 
= 105/7 
= 15 

c2_xl = [5 + 8 + 6 + 9 + 13] / 5 
= 41/5 
= 8.2 

c2_x2 = [6 + 7 + 6 + 10+ 12] / 5 
= 41/5 
= 8.2 



i3mmij LSsmQLD spsuQsnrrq^ data-<* 0 LD, 
asmi—rfilrsp ^Is^sfluLiu L/srrsyf)«(05<*^LDfrsw sgjijLb 

aGmd&)i—uu®£)jD§l.<s>tS>l i zv (^ss)g)surrssi ^srrsiy 
ggjrjLb Qamsmi— ^gsi^^sh ^su/Dgudamm 

GlarrgGilsb ^js^ismdlsirpsm. ^)si/saf7/D^<® $]di(§ 
lSssstQld Qarr^atf&sb 

a_0SU/7<®<95/j/_/(5)ffl/D^/. ^)<531SU <5/TSiy<95(531S)T 

(jpaargn/LD -fjDgg §isbs6hui£>rT3iij i3ifluuss)g,d 

3>/7SM7Sb'fTLD. 
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£|)6i/6Uf77Df735<5 < 5 / 7 siy <*m &>md(&)film Qarr^fiZisb 

efilsurju Qurrq^fB^iLD GuemnuSlGpiLb. <g£i&>GtnmQuj 
fB^LD Qs,rn_ijddhuns,d Qffiugj QarrsmGii— 

GlffsvsvsvmL. su clustering with k-means 

GJssiuu($\&}rD{r ) i. k sisiiut^i sr^ssxsm 

Qair^arfansb/ (^(Lgdansb s-QjjsurTdauuL- 
(Ssusm(fi)LD <oTsir ussi^il/ld, means stotu^j 

epsiiQ<suiT(§ features-siy<55)i_(U &[m&filGmuiLiLb 
aGtssr($\L3u\-£& l i ^^,ssii^uuss)i—uSlsb (§(Lpda<5tnGfr 

a_0suf7<g50su<53i<s(/7LD 0/r5)uL5)(5)ffl/D^/- 

$}[5£B k -sir LD^iui3stnm srsijsungu 
aernddUSiGugi sissT/Q] unfidasomi. 


19.2 Elbow Method 

Q&nQd&UULLl— <5/J‘6iy<S(6)3<S0 GT£&,G$)m 
(&)(LgdaG$)6iT S-QfyGund&lmnsb &filiun& £|)0<s0ld 

GJG51UG$)&, sp(§ (oil(oft)IJUL _ LD ^LfiSVLD <9iSmi—jfihu 

a_<5siyffl/D^/. (%LDrr)aiG®sn—<$]$&> &>tjGij3iGtn6rr 


£|)f5)0LD /5/7LD uiumuQamsbsrreomb. 2 
(&)(Lgdai6b <ormuGts)&> r 5 LDd@ pfiiurraid 

arTLLQSlfD&rT simu uiijdaiscmb. ^^roanesT 
rfffjsv i£>ri)Q]Lb siSlsrrd«id i3msn(f^i£>rf^]. 

https://gist.github.com/ 

nithvadurai87/10b5b273151c80be97579d6842 

79cd84 


from sklearn.cluster import 
KMeans 

from sklearn import metrics 
from scipy.spatial.distance 
import cdist 
import numpy as np 
import matplotlib.pyplot as pit 






Xl = [15, 19, 15, 5, 13, 17, 15, 

12, 8, 6, 9, 13] 

x2 = [13, 16, 17, 6, 17, 14, 15, 

13, 7, 6, 10, 12] 

X = np.array(list(zip(xl, x2))) 

distortions = [] 

K = range(l,8) 
for i in K: 

model = KMeans(n_clusters=i) 
model.fit(X) 

distortions.append(sum(np.min(cdi 
st(X, model.cluster_centers_, 
'euclidean'), axis=l)) / 

X.shape[0]) 

pit.plot() 

plt.plot(K, distortions, 'bx-') 
pit.show() 




g)£)<si> xl, x2 stot )jld ^psm® 'SyLb&rmaaQfffLb 
numpy qpgvldx gtgvild ep(Srj ^Gtstfliurra; 
ldr~jdjduu(^S}jd^i. ^^rjGij3>sts)Grrd 

Qarrsm® kmeans-agj/j uuSlro^i 
^Gufldaruut^dljD^i. ^uuuSlrD&iunm&ii 1 (tp<ssi) 
7 s uGtnrr usb(Ssurr^j GTGmGvsfldGtnauSlGV (& ) (Lgd3,G$)<srr 
uuSl/off) ^Giflddljo^l. spsijQsui'(§ 
(tpsai/Diiy^D ^<5 sot <s/jsiy<s(05<®0 ld, ^saaRsiyu 
Li6rrts/rld(§LDrT6tsr Gfilsvarsv gtgugugttg^ aflrjLD 

^(fjjddl/D&ti toTsiruGtnpd aismdd}(^d}jD^i. 
^GUGunjorra; gti 6<s GTGmGvsfldGtnauSJ si) gj( Lgd&GtnGtr 
<s>]GtnLDd(3)LD (Surra,! ^GUinjniGvisrrsrT a,iiGLftf,Gr?iGisi 

6l5)(SU<S6U 0<S31/D®/D^/ GTsilUifrf 

arGm(^L3LS).daiuu(^d}iD^i. £§)f5<5 Gfilsvarsv 
ld^iuCSu cost / distortion gtgst^j 
^GmLpdaruut^dljD^i. 


i3mmij £D<sy>sn s^0 GUGnuuuumai 

Gir gs) rjujuu($§>}sirroG$r. £D<s<sotx ^ddlsv 



(3j(L£ <*«syfl sir toTsmsvsfldstnaiLiLi), y ^ddUsv ^sseir 
sfilsvasv LD&JuLiatgnjLb ^stnLD&lssriDGtr. 

&T(osrCS gu <9> n(5&r <5p(^[f <sp(r^ ^)<9>/7<9>j5)<su <> 1 / 5 i //> /< 1 1> 

.S/rsiyasaisyruyLD ^s/old <95 0 td <3 (_jf7gj7 ^xagiimLiu 

centroid-aS)®^ ldidjd ^/ 7 siy«sy?)<sw Gfihsvasv 

LD^IULI 5-<950 (Swsv <95frLfJlSU<531 <5U^LD, ^gi/^/Gsn 7 
<5<S5fl<S<5®sf) QarTgatfdaiSfrnaiu i3$d(ajihGurr3>i, 

cgi/<5 6p <531 i_UJ 6l5)<SU<956U LD^lUL/ l-<950<95 Stp 

<95<S31<95 U-ILD <95/76OT76U/T7D. £|)/5<5 GUGS)[JUI—Lb 
urrrjuu£iD(8) s£0 (tprp/5/63i<95 snt<j.S25)su 

^)0l}l/<5/7su, Elbow method srsm^i 

^s^iLpdauu^dljo^i- £§)/5<s sn<s3i/T/_//_<5^)<53rx- 
^ddUsv 2 toTGST/D LierrsufluSlsv (tprp/sisai <95 

(SunrsirjD sui^suld LDi—iEid) siHlfisugrrev, <s>]r 5 & 
GrsmsvsfldstnauShsv < 5/7617 <95 saiemj LSlrflppmsv 

(HurrgiLD CT6OTL/63i<5 /g/TLD Q^flrs^i Qainsbsnmmd. 
sjQmssflsb ^&>JD(3) (Sldsv Q&<sv<svd Q&<sv<sv 

®S)eoaieb LD^iULj3><sb sprjmsijdQsy (^s^npdlsirpsm. 
£|)/5<5 Lierrs/rluSlsv < 5/7537 (LpLpihiG&a, ldi _ /5J07D 



rflesxsc <s?/du®§>)id&i1. stswCSsu <5/7617assist 2 
(gQgdasrfhsv LS)/f)< 5 < 5 / 7 <a> pfiiurra ^(njd(§Lb stswl/^/ 

<5 sm @1 l 5? t<j. <5 <5 u u ® Sljo^l.. 








19.3 silhouette_coefficien 
t 

s £0 algorithm- sir Q&iusvtdljDsir srsirugj ■SHgi! 
srsijsusfrsLj §nrjLD &fihurrad ff,srn]{f,ff,jerr may 

srsirussigu QunrgupQg ^sidldSIid^i. ^{fjsusmrj 
[ 5 mt asmi— ^smesrd&lGViLb. algorithm-sir 
<g5SOTf?UL/<*s5)srr ss$$rs$)LDiurrm LD^IULi^i^isir 
spul3l 1Q) <sn&>sir Q&iusvdijDstnstrd 
&G$$n—f$\r 5 (S&>rrLb. ^esrnsv k-means (Surrsir/o 
unsupervised learning-si) spuLSh^su^jD^ 
rBLDLSh—LD &,rjsya,sb sj§]ld ^svsvrrp arrrjsm^rrsv, 
<g£l&,s$)md asm® l 3is^da, a_<ssiy id sp(§ 
sur^)(tpsD/D(5uj silhouette_coefficient ^,0 ld . 


.sypnsijgj k-means (LpstnrouSlsb 

sus&)&> lj u(j))r£&) lj u(])) ld &>rjsi)3isb, &fihurrm 
(Lpstn/DuSlsv^rrstr suss)3iijuQ\^uuLLQ\sbsrr^,[r 


srmd asmi—rfihu sj[os,ssiQsu distortion stsot/t) 

spsirsmiD ^ensSlLKSi—rnb. s^siiOleurnTF, <s/rsiyLD 
Gdsvsfleyu Ljsh<s/rluSl<s61(§rB^i CTsi/snsYrsiy 

§!jrjib s>ShsvShi31(§dS}jD^l STmuGS)#, gsxgu^&j, 
kmeans-sw Q^iusb^ipssxssrd aernddUfidliDgi. 
^gygj iQumsvQsu g)/ 5 ,« silhouette_coefficient 
GTGZTUgj i3m<su(njLb GumuuurT (/j) qpgold <s/ 7 siy<*syr 
cSysDLD/B^/syrsyr s^si/(o)su/T 0 (3j(ip6iyi£> CTsi/sivsvray 
&>d&\&)LDn<9>Lj LSlfldauuLLQmmgj GTmuG^^d 
asmd§>)®&>)iD3)i. 

ba / max(a,b) 

£|)|5)<si 3 a toTGsrugj <spQrj (^(LgaSlsb ssnsn 

&>[jGH'9>t§n ) d&}Gis)i— < 3ujujrrm3 : !j[T3 : ifl3)if!jLb. b 
Grmugi 6p(fjj (& ) (Lg®S)rD(& ) Lb 
0 (tpsi 5 )/i) 0 ld £|)<sin/_(S uj 3-Grrm 

&>rjsy&tQnjdd)Gtn/_(Siuiurrm &pn&!?l §ijrjLb. 



dipdasmi— 6T(/j)<953i/<95<S/7L_l£6lj [BLD^I ^IJSlj3,sb, 

kmeans ^Lpsctb (Lp&,sSlsb 2 (^{Lgdansniaiu 
LSlrfldauuQ&lGST/DGtr. ^sijsunrQjo for loop ^LpsvLD 
<31®£&([))£5&il 3,4,5 mjDgULD 8 (&)(Lgdgi6rrrT3iij 


L^lifldaLjuQdlsvr/DGW. £§)f5<5 loop-<®0syr (^{Lgdaisb 
Q&rrQd&uuL^L- sjsssisssf]dss)a,uSlsb epsiiQ<surT(§ 


(Lpss)fD <sns$)LDiLjLb(SurratfLb, <s>i§] jS/rsiyasDSvnj 
l 5)/D<*0LD SlS)<S<5<S31<5 GUG$)[JLn—LD[T& GUS$)[jr5§l 
amlQdljDgl ldpoiqild 
silhouette_coefficient LD^iu&nu 
Qsus/rluuQggj&.ljDgi. 

https: //gist, github. com/nithyadurai8 7/ 

f5f043df412b6e3c8291d0080422bd92 
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from sklearn.cluster import 
KMeans 

from sklearn import metrics 
import matplotlib.pyplot as pit 
pit.subplot(3, 2, 1) 

Xl = [15, 19, 15, 5, 13, 17, 15, 

12, 8, 6, 9, 13] 

x2 = [13, 16, 17, 6, 17, 14, 15, 

13, 7, 6, 10, 12] 

pit.scatter(xl, x2) 

X = np.array(list(zip(xl, x2))) 


c = [' 

1 b', 

' g' 

1 1 r 1 l n l 

I 1 t u t 

'y\ ' 

’k\ 

' b 1 

] 

m = [' 

'o', 

' s' 

', 'D 1 , 'V', 

' P' , ' 

1 * l 

I 


'] 


P = 1 



p += 1 

pit.subplot(3, 2, p) 
model = 

KMeans(n_clusters=i).fit(X) 
print (model.labels_) 
for i, j in 

enumerate(model.labels_): 

pit.plot(xl[i], x2[i], 
color=c[j], 

marker=m[j],ls= 1 None') 
print 

(metrics.silhouette_score(X, 
model.labels^ ,metric='euclidean' 
)) 

pit.show() 


print (model.labels_) sTsmuffj (tp<s <sv (^(Lp&neu 0 

CTSOT/p/LD £|)/7SOT//_, , 7£L/tf l / 0(i£<S316U 1 67637771/LD 

67637 (Ssi/xl LDfDSQlLD X2-<SV SSnSTT 

12 (sy^rr> 67/5^)<5/5<9> (ov 

Q&tjd&yijuLLQmsrrm tormugjLb 





coefficient ld^ulild i3msn 
Q gij sift liu® Q it) gi¬ 
ll 11011110001] 
0.6366488776743281 

<31siusunGirr) 3 (gQgdamrrau LS]fld(^Lb(Sunr^i 

0 (^(LpstnGmLiLb, 1 ^jusmunrsu^i 

0(ip<sy>6uuyLD, 2 ^Lpsmjorrsu^i (^(Lps^ieuiL/Lb 
i3msu(f^Lc>rT!Qi (g/fluLUlQlSl/Dgi- 

[0 0010002111 2] 
0.38024538066050284 

^)^/<3i_/f7<s3r(S/D 4,5 LLjogULD 8 cgysvrsiflsu 
( §Q£>darr<srmau LSlfld^LbGungj <s/7siy«syr 

( a&rjrB&jsbsrr (QQ-gdansfilm ldS^ul/ld, 
<3ld(§(Lg)G)5)fDam5tsr Q-fiusv^lpm ld^ulild 

L5)®sr6i/0L£>™y G)sus^uu®S}mg)m. £D<s3i<s 



sn<sijd,aju urrrjd^LbCHungj 2 (^{Lgdansrrrraiu 
i3tfld(&)Lb (Hurr^i ldl2®(3ld, <s>^errsi^ 
Qaiusv Gd/Dsmssr (0.63) QeinsifluuQggisiJsmpd 

arrsmsvrrLD. 

[2 0010002113 2] 
0.32248773306926665 

[2 4010402113 2] 
0.38043265897525885 

[6 7343162045 2] 
0.27672998081717154 

dLpdasmu snssrrjuuu^^isb (ipa,mrrsua,rrai 
ssbsnoj QsuguLD ^rjsi^ai^darrsrsr uuld. 

Usmi—rrsu^rra asbsnoj 2 (^(Lpdamsrrrrau 
L51rfld(§LbG>urrgi Qsus^uuQild susd/jl/lld. 
<S>!(J))£&(J))£5§I ssbsrroj 3,4,5,8 lorsmsrpsfJdsmauSlsv 



0(tp<95<S<ST>6Vr ^StDLDd^LbGiuiTgl 
Qsu<sifluu(^d}mp GUGtnrjui—rhiansb. 
tSufiZi&ULLgLDrra, 8 (^(Lgdansb Gusvrj pfjGy&m 
i3$lda>uu($\§>}mpm. srssrQsn speuQsurr(§ 
(&)(LgG)S)Gg]Lb 3-<sb<5fr &,[JGlj3>G$)<5fT 

Gfilp^iurr&uuQpGdd 8 ftp 

GusmsmrhiatQnjLb 8 QsnshQsnp] sui^surman^ib 
Qamsmi— ^psm® ulLl^iusv 

2-QjjGijrTd&uu(j}&>)p& > j. cgysnsiv spshQsnrTmprrai 
loop-<*0S)T Q&mpj i3msi]qjjLC>np] 
Qsusifluut^dlsirpsm. 
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20. Support Vector 
Machine (SVM) 


Support Vector Machine (SVM) nasrugj 

<5/7Siy<*<531S)T GUGtS)3iLju($\£$U l3ft UUnm Sp 0 
suL^)(tp<53i/D ^{gro. sjroQaimQsn <^&,rr)Qaim 

logistic regression srsstuss)^u uro/filu 
urrij^Q^md. ^mrrsb $}[5$ SVM msiiua^ 

SUSS)&UUQ\^§]9,sb (oTGVULD G>GU <531SU SS)UJ logiStiC-g> 

si 5)l_ (SWirgn /ti; &/Dgu ^isbsSUuLDrra, ^s^iLcddljo^i. 
(Hr5[f(j>arT($) ^Lpsctb LSlrfldauuQLC) ^rjsijaa^d^ 

large margin classifier msiisun^ 2_<ssi/ffl/Djy 

STSOTLfS3i<5uyLD, (S/s/KStSfr® (Lps^ijDuShsv L$lfldauu z_ 

(Lpi^iurrs, ^rjsijan^d^ kernels srsi/sufr/ry 
a_<Ssiyffl/D^/ STSOTLfS3i<5uyLD ^uu(§$5lu51<sv 


ansmwrtLD. 



20.1 Large margin 
classifier (linear) 

^Lfidasmi— a_<s/T/7®ssr<5^)siL) sp<r^ (S/g/jCSa/r® qpgold 

Sl]SS)3ilju (/j)<5<5 (LpiglL/Li) <S/7Siy<*<531S)T logistic 

torsi]«unrgu iSl/fldSl/Dgl, svm srsinsurrnjj LSlflddljD^i 
srsiruss)^d aurLiigiLjsbQsnrrLb. ^)^)si)Xl, x2 
GTstnnb £g) ijsm @ ^Lb&tmasb 2 -sbstrm. ^ss)su 2 
urflLDnrsmrhi&srr (2 dimension matrix) Qarrsmi— 
spGSrj ^smfliurra numpy opeorb 
i£>rrg)g)L]u(^\S\ssrg)m. t3 sir star} ^rigrrsLjihsstsud 
Qarrsm® logistic-<9S0LD, svm-agjLD uuSlrr)d) 

^stflddlCSromb. i3sirmij spsijQsurrsir^jLii 
&,rjsya,sts)snu i3$hjug,ri)airrm QgrjQanLiigssrm 
&lfhurTa stiei(§ ^H&nLDdSlsirrD&sr <ormuss)&>d 
&rrss$ru&,rr)&rTG$r rflrjsv classifier()-<*0srr 
toT(Lgg,ijuLLQ\sbsrr§i. 


https://gist.github.com/ 

mthvadurai87/2de5a6a6f7cc03c2791305f5c33 

d43d7 


import numpy as np 

import matplotlib.pyplot as pit 

from sklearn import svm 

from 

sklearn.linear_model.logistic 
import LogisticRegression 

def classifier(): 

xx = np.linspace(l,10) 
yy = - regressor.coef_[0][0] / 
regressor.coef_[0][1] * xx - 
regressor.intercept_[0] / 
regressor.coef_[0][1] 
plt.plot(xx, yy) 
pit.scatter(xl,x2) 
pit.show() 






Xl = [2,6,3,9,4,10] 
x2 = [3,9,3,10,2,13] 

X = np.array([[2,3],[6,9],[3,3], 
[9,10],[4,2],[10,13]]) 
y = [0,1,0,1,0,1] 

regressor = LogisticRegression() 
regressor.fit(X,y) 
classifier() 

regressor = 

svm.SVC(kernel^ 1 linear',C = 1.0) 
regressor.fit(X,y) 
classifier() 


logistic ^LpsvLD < 5 /jsiy<*syr LSlfldsajut^LbCSun^i 

^pjn&rTGsr (UrsrjGarr® LSlsirsuQ^LDngu 

SGrp s-sbm GU6tnad(§ L/SassiyLo 
Qr 5 <ri)daLDrTa srsi/si5)<s ^)<sD/_(o)susyf)u 7 LD 
^svGvrTLDGV G/s/jGa/r® LDdau u LlQmmgj. 





^mnsb (SldCSsv ssbsn 6 u < st >< s < s 0 ld 
(S<g5f7L.t^./D0LDfT<sOT ^js^ii—QsusifKSujn L£l&<syLb 

^^lamna ssbm§]. 



SVM ^LpsvLD a,rjs^asb iBl$\dauu®Lb(Sun§i 
^jijsm® GUGtnad(§LD rB®s)Sl<sv asbsrr (San® 
cgysi/siS/jsw® su®si<£6u5)®S)0/b^/ld aLDLimm ^syrsiy 

§Ijrj£®eb 3-<sb<srr§]. iotsstQsu ^,nm equal 
margin / large margin classifier ersw gu 







^anLfidauuQSlrD&ij. logistic regression- 
d&rrm s ^0 optimization-^^CSsu 
< 950<5 u u (J) SljDgl. 



20.2 Kernels (non-linear) 

Kernel ot<sotl_ 0 / ( 3 / 5 //( 3 < 95 / 7 (/j) (Sunil !_(/>) i3$ldai 
(Lpuj-iun&) ffjDgu 3 iis\.mi£>nm non-linear (Lpstn/DuHlsv 







<^63170/5 #1/6)76)7 <5/761/666316)7 

6U63165LyL/(5)<5^/6U<5/D0L} UlUSSTU^Slp)^. 

(S/_//76377D QrBljQ{F,mlu^S,V QuiT(§fB^rT^ 

<5/761/<*6316y7(j Qlurrgud,{F,]Gl] <57)0 ^TjdQ^S^rGsu 
polynomial regression srdiro spdrss)tDu 
urrrjpGprTLb. ^mrrsb ^^l<sv spsuQeurr(§ 
features- 6 i/ 63 )/_uj higher order ld^)l}l/<* 6 )t 

6663S7<S®/_L)L_/L_!_(/)) ^63)61) L/^)<5/T<* ^)63)6337/5^/6)T6)T 

^Lb&mjgnsfTrraid asmddhsv QarrmmuuQLb. 
srmGsn <5/761 /<*m (Lp(Lg> <5 rrau Qurr(§rB^iLb 

su 6 $)[juSlGg]Lb square, cube 6753771/^(/j) <5.5 (/j) <5.5 
order-si) features-®o<s asmddlLl® ^smsm^gid 

Co) 66 /r6337(5/_ Q&svGisumi). ^GUGungu 

Q^iuiLiLbCSurr^i uuSlindl ^sufldauu^LD < 5 / 76 i 5 )si> 

/£s)<S ^ 1 /6)761/ c gi/LD6F/5j666)7 <3<977<S66 L)LJ@611 <5/76U, 

6^0 algorithm {f,f 6 rryd Q <s rrsti 61/ dg 7 ) <$ rr&si Gr5F(LpLb 

SitSSsflssfl ^1/631637<5631 <5U/LD ^/£?)<$ <S>]mG> 51 <SV 
/£)6316376)5)si) 63)61J<56£/66 (o)<5f76)76)7 Gsusmi^UJ 

G&>Gts)GmHLD <s>l§?larfld£)iD§i. £|)63i<5<5 



^stSlijuu^roarrai Gupg.Qg, kernels / similarity 
functions 

<g£l§l L/tslgi l /^)< 5 / 7<35 ^rbarmassxsn 
^stnsmd&rrLcxsv, sjpQaimQsn ssbsrr 
^LDardasuflsv £§) 0 / 5 aj lj^Iuj ^Lbardasmmd 

asmddl lL®u uiusiruQpagdl/Da)!. 

2-£rr[j<5m£§]d(§ fB ld&iI uifiifodldj ^rjsiSUsb 5 
^rbarhia^Lb 100 LO/rSS/jkg a > rjsija>t§rr > Lb ssbsnm 
srmg}j s<s)<sij^agdQaa<sb(SsurrLb. Polynomial 
otct/ld (Surrag ^j^smaiu 5 features-agjLD 
square ld/dtuld cube LD^Iuqaeb 
asssrQ\i3i^dauuuQ\, asmi_dlu51<sv ^stnsii 20 - 
<® 0 ld (Sureorrm features-^* Gursag rfljD^Lb. 
^agCSsu kernel Qpsvrb Qurrgg]d,3)iLb (Surra,! 
spsuQsua(§ ^LDadiasifl&giLD ssbsrr 100 
LDrrGSrfl&s/rhsv «£® a,rjaShsis)m (Sa,ijsi! 

Qaiuaj <sga,GS)G$r landmark-^® ^stnLDddlioag. 
i3sirmrj ^gi/SDsiS )®pag urroro ppsyarm tSTsirsumeij 



tf,urjd,Sy<sb ^ 3 ] &s) LL>rfi Sfj err sum sTmug,] 
asmddli—LjuQdliBgi. ^&nsu landmark-<95(35 

cgi/0fflsu ^)0^<s/7su 1 CTSOTSiyrc, ^svsmsvQujsvflsv 
0 CTssrsiyLD gu sis)&uu<$\£&,uu($}§>}sirrom. ggjei r»<s 
<sDsn<s(S ; 5 li^Iuj feature asmddluuuC^dljD^i. 

uuSlpdlp <s/7si5)si) asbrn 5 
<s>]LD&rhi <95(25<®0 (o)si jg^LD 5 features 

LDlIQGlD £g) LD(LpGDJDuSJ<SV 
<95 <S3Sr<95 ® /_ U U (J) ® SOT© SOT. 


g)/ 5<5 similarity function-<95<9577<S57 piAssrurr® 
iSlGSTGUQjjLDrrfQ]. ^)^/Gsu kernel ctsot/ti/ld 
^G^iLpdauu^dijo^i. kernel 

< 9565 SOTTAS ®<95 <531 srr /^< 95 ip< 5 ^/SU< 5 /D 0 U< 5 V< 3 <GIJgU 
GurrujuunQ&stnsnu QurorSlq^d^tb. ^^tsv 
spmrrjnm ex^O-d&rrm &ujmurr($\ S(3rp 
QarTQdaLjuLLQmmgj. ^]^i3su gaussian 
kernel OT&jr/p/ ^stnLpdauuQdl/Dgi. $j3p 



Qurrmr^j polynomial kernel, string kernel, chi- 

squared kernel, histogram-intersection kernel 
CTsw/p/ usvCSsugu Gus$)&>ujnm eurriLiuurTQam 

kernel-si) ssbsnm. 

fl = similarity (x , 11 ) 

= exp (-(||x-l ||**2 / 2 *sigma squared )) 


SVM without kernels srdtu^i logistic 
regression-g>« ^^rrsugi kernels 

g tpsvLD 3-QjjGurrdau ulLl_ lj^Iuj features-gp/j 

uiusiru(^^nLD<sv, ( 3 ( 5 / 7 /jg(u/ 7 * raw feature-go* 
Qanrsm® ldlGQGld GUG$)3,uu($\£& ) i&>Gb 

rflaLprs&rrev, logistic regression-g»<3uj 
sissrGsu (rruGun&rf kernel-go l/ 
uiumu® < 5 < 5 su mb GruGiurTgj logistic-gpzj 
uujsiru® < 5 < 5 si )mb srsir/p/ urrijuGurriJj. 



G<5/7/b(d)^@<*<sl}ljl_z_ tSmb&rmansfilm 
CT<S3Sr<SOT/?<®<53165(100000 Or 100), L/u5)/D^)<®0 
cgysyfl<*« u ulLl- umSilffljB .s/rsi/asyfisw 
6T63OT6OTf?<95S31<95S31Uj(10000) SI5)/_ L/5)<95SiyLD 
cgi/^)<g5LD/7<* ^)0^<5/7(SsU/7 c gi/SUSU^/ L/5)<95SiyLD 

0S3i/dsi7/7<* (^■(!T ) rBs > nGsvn svm without kernel- 
®p(j uiussruscrub. su features-sw 
STS3S/‘Sot/?<*S31<95(1000) L/5) <95 Sly LD <^1/Si) <95 LDfT<95 
^sbecm£>sb s£/rsirsi/<95(3j <f/D/p/ ^dyi&LDnai 

(g£l(ijd(8)LbGurr& t ] svm with kernel-got! 
uiusiru QppsomL. 

dLfidasmi— <oT(^\^§]dairTLLu\.sb u suCSsu/p/ 
^Lb&rm&Gsxsn saisu<s^/ spq^ LD<svrj LD<sv<s6hurr, 
GtjrreoiTGun, &>rTLDGS)[jiun sissirry 

Gus$)3iLiu($\£&,uu($\&\!D3)i. ^stneu svm without 
kernel logistic ^lpsvld 

SUS31<95L9LJ(5)<5<5l}lJ(5)SUS31<5SiS)/_ kernel ^LpSVLD 



Gi]GS)anju($\£&>uu($\Lb(3un& ) i accuracy 

<SH$£iai$hjuG$)& > & ansmeoiLD. 


https://gist.github.com/ 

nithyadurai87/9d7cc99cc4ael8a3707cc76f871 

1193b 


import numpy as np 

import matplotlib.pyplot as pit 

from sklearn import svm 

import pandas as pd 

from sklearn.metrics import 

accuracy_score 

from sklearn.model_selection 

import train_test_split 

from sklearn.svm import SVC - 






from sklearn.metrics import 
classification_report, 
confusion_matrix 
from 

sklearn.linear_model.logistic 
import LogisticRegression 
from matplotlib.colors import 
ListedColormap 

df = pd.read_csv(/flowers.csv') 
X = df[list(df.columns)[:-1]] 
y = df[ 1 Flower'] 

X_train, X_test, y_train, y_test 
= train_test_split(X, y, 
random_state = 0) 

logistic = LogisticRegression() 
logistic.fit(X_train, y_train) 
y_pred = logistic.predict(X_test) 
print ('Accuracy-logistic:', 
accuracy_score(y_test, y_pred)) — 





gaussian = SVC(kernel='rbf') 
gaussian.fit(X_train, y_train) 
y_pred = gaussian.predict(X_test) 
print ('Accuracy-svm:', 
accuracy_score(y_test, y_pred)) 

Oaue/fUi?®: 


Accuracy-logistic :0.868421052631579 
Accuracy-svm: 0.9736842105263158 





21. PCA - Principle 
Component Analysis 


Principle Component Analysis < 3 * 1 $a 

cgysrrsiy ufilLimGmiEiam Qamsmi— &>rjsij3iG$)6rr 
0<53i/D^ cgysyrsiy urfl ld nsmuhiff,err Qamsmi—grra 
LDnrD£]GU£rD(3)U uujffiruQdlrDgl. 
sr(i)\£& ) l&3ifTLLi—na 1000 ^tbffdiaissxsnd 

Qarnsm® spq^ stflGtyiuLD aGvsfldauuQdliDgl sjsst 

Gtnsij£5§jd QarrsbQGurrLb. PCA -^,sst§] ^r5&> 1000 

X-®» 100 X-^aGtsurr ^svsvgi (g)<sargn/LD 
(^snjorB^ uflLDirsmdiash Qarrsmi—^rraQsurT 

LDnrorSld Q&rr($\d(3)Lb. ^^rrsugi Y 
(oTsmGvsfldGtnaGmuu u/ojfild aGUGm<svuui—nr& t j. 
Qsui^jld X toTss$rstsrf}dG$)3,G$)Uj ldlLQld 
(^snnod(^LD. CT®sr(Ssn<sf7<s3rPCA simu§i 



dimensionality reduction-.®^ 90 

£)/Dl}l/6u<st><* suL^hupsuro ^(s^ld. ^a.sir 
QaiusvurrQ&s/rlsv ssrrsrr uapash i3msuqt)LDrrg y. 

• (Lpa>sSleb uu51id§)£ a,rjsijass)smj QujDgud 

GlarrsrriQtygsv (xl,yl),(x2,y2),(x3,y3)... 

• <SH($\£a>a>rr3i PCA ^Lpscrb uuSlpdl^ a,rjsiS]sb 

ssnsnX cgi/ssKSOTjSsaijSuyLD /bld<®0<5 
(H^snsuiurrszT <sgsrrsy 0®n/D^ 
STSmSttfldsuaufilsV LDrT/DgU&SV 

• i 3 mmij 0®n/D<*<*(-}/_//_!_/_ X -god 

Q&rrsm® uuSJ/d® ^sifl^sv 

Qurrajsurra, $][5& PCA ^i—£$£ls£iLD 

uiusirui—rr&g. a/Di)] ^flprr&Gisu uiusiru®Lb. 
srQpagdamlQd^ urssfla, (Lparmasb^sbec§] 
asnrj^lasrr (3 uassrp supsstro ^ssu—ajnfGrruuQpagLb 

algorithm-<®0 uuSlrodl ^s/rldauu(^Lb 
a>rjsn&sfileb (^surors^ULlaLD 1 scLLarb features- 



^, 6 ujy ^) 0 <® 0 ld. sjQmssflsb spqjj ssaij^iuSlm 
pdarjib, ss)3,ui3i^., £|)0<s<s3i<s, udad 

3isssrsmrru\.ai<sb, (Lpssi aSlsnd^ansb sissiiry 
spdjQGunqj) §\mmd §\mm s^s^iurhiSySsxsmLiLb 

<s>iG<s)i—Ujn<smju($\£&, ^£5)35 ^snoSlsb features 

<3 /< s 3 ild / 5 ^) 0 <® 0 ld . ^giGurrsiriD £§)(_ihiasiflsv, 

cgl/SDSI/ <^<531 <S37<5<S31<5U7LD L_/UJ<S37LJ(/j)<5<5f7LD<SU 

0<s3i/d/5;5 ■sH&isS'i&j features-®» ld rr id gusu^jo^ 
PCA uiusirut^dljo^i. (oTuGiurrgjLb pca-gpu 
l/uj<s 37 lj( 5)<5 ^/<su< 5 /d 0 (Lp<537/_/ feature scaling 

ct<537/d spmg] ansmiguunai rss^ii—Qup (Ssusm(^)Lb. 

$lgiG> su data-preprocessing srdrg] 

cgi/<5317p <5 <5 (j @ LD. 

Stp<5<5(5337/_ OT(/j)<5dy<S<S/7L_!_Li7_<SU /5/7LD L//f)/B^/ 

Qamsrrmd &< 5 VULDrra £|)0<s<s (Ssu<5337 (J)ld 
<s7<537l/<5/d<5/7<5 4 dimension Qarnsmi— ^tjsijansb 2 
dimension-^* PCA ^peoih 
LDrrjDjDuuLi(^mm^i. PCA l/uj<s37lj®<5^/<su<s/d0 



(tpswsw/j Standardscalar ^lpsuld 
normalize Q&iuujiju($\&\mrDm. i3mssrj sp(§ 
wsvrj LDsbsShurr, GrjngQrrsurr, s,mij®s)fjujn sissr/p/ 
t5[TLDrT<5xflda <s>)Gi]«>5l£5L£atQtijGini—Uj r$srr ^a<sv(LpLc>, 
^GU/DrfihssT (old/dl-I/d ^£Lfia(gnjGini—Uj 

< 3 ta 6 V(LpLDrra 4 tsmb&tmasb ssbsnm. ^jsmsu PCA 
(’LpsvLD xl, x2 t, ism up ^jjsmQ) <s>l LL&iEiasrma 

LD/7aj/D/j/_/(5)ffl<5377D<5W- ^SI/SlS/TSW® ^LD&lhl <95 <s/rlsir 

^Lsj.uusmi—u51<sv ^smDiLiLb 3 sustna LDSv;j<3ii^Lb 3 

rfljDiEiasifhsv sussi/jui—ld^ GusmrjrBgi 
95 fTL. z_ u u lL ® sbsn§]. 


https://gist.github.com/ 

nithyadurai87/20dl8bbda53e43del9222e24d3 

30a398 





import numpy as np 

import matplotlib.pyplot as pit 

import pandas as pd 

from sklearn.model_selection 

import train_test_split 

from sklearn.preprocessing import 

StandardScaler 

from sklearn.decomposition import 
PC A 

df = pd.read_csv(/flowers.csv') 
X = df[list(df.columns)[:-1]] 
y = df[ 1 Flower'] 

X_train, X_test, y_train, y_test 
= train_test_split(X, y, 
random_state = 0) 

pea = PCA(n_components=2) 
x = 

StandardScaler().fit_transform(X_ 



new_x = pd.DataFrame(data = 
pea.fit_transform(x), columns = 
['xl', 1 x2 1 ]) 

df2 = pd.concat([new_x, 
df[[ 1 Flower 1 ]]], axis = 1) 

fig = pit.figure(figsize = (8,8)) 
ax = fig.add_subplot(1,1,1) 
ax.set_xlabel( 1 xl 1 , fontsize = 

15) 

ax.set_ylabel( 1 x2 1 , fontsize = 

15) 

ax.set_title( 1 2 Components', 

fontsize = 20) 

for i, j in zip(['Rose', 

'Jasmin', 'Lotus'],['g ', 'b', 

' r ' ]) : 

ax.scatter(df2.loc[df2['Flower'] 
== i, 'xl'], 




df2.loc[df2['Flower'] == i, 

'x2'], c = j) 

ax.legend(['Rose', 'Jasmin', 

'Lotus']) 
ax.grid() 
pit.show() 

print 

(pea.explained_variance_ratio_) 

print (df.columns) 
print (df2.columns) 


QeueiflidCKl: 


[0.72207932 0.24134489] 






Index(['Sepal_length', 'Sepal_width', 
'Petal_length', 'Petal_width', 'Flower'], 
dtype-object') 

Index(['xl', 'x2', 'Flower'], dtype='object') 



2 Components 
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• Rose 
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-2-10 1 2 3 


xl 


£D<$6p<s5)(_uj gtgZiu3)1 explained 

variance srmu&j [0.72207932 0.24134489] 




















STsw snrB§]sbsfT§]. ^)si/si5)/T<sw@ n>|5)/jL/<9S63i6)TuyLD 

a^.LLu\.ssmsb 0.96 sjssiiq] <su(§ld. ^j£5jD(^ stsstsst 
< snij£&>Lb lormprrsv ^sijsiSirjsmQ} components- ld 
Gt&rjrB&tj 96% ^asu svas^im s-<sbsrri—dS}iLisb<srT§i 
67ssifry sjQmssfleb features-®»<95 

( §6T>IDd(§Li>(3uiT§l <5<956U<SU ^LfiLIL/ 67/DZ_/L_ 

GurriLiuLi 3sbsn§]. srrnQsu variance <orssru§i 
CTsi/susyrsiy 5<56 i?< 5 ld pasusvam epsijQ<sunsirn^evi ld 
(3&L£ldauuLl®GrrmGtsT 67siruG$)&,d a^jo 

3-ff,sij&\!D§l. ^Gmpuu/DrfihLiLC), PCA Q&ilusvuQld 
(o&>&)U-i ld sir snub eiSIsrrd&LDaad SCSrp 

arrsmGvrTLD. 


21.1 Data Projection 

a,rjs^a<s^lsir ufimasmihias^isn (^ss)puua > d)(a ) 

a_<56iyLD ^)l/_(Sld Projection line ^svsv^j 
projection area OTsmjufjiffl/rwy. dLpdasmi— 

<ol](o$)[JUL _ fhJ&>(o5)(ofT &>(oU(otifld&>(oLjLD. ^]L_^/L//DLD 


s-sbsrr ui—£ 5 $<sv 2 dimension Qamsmi— ^rjsijansb 
1 dimension ^<95 LDrrrDrDuuQGU&.tDairTm ^Il2i—ld 
ssrrmgi. ^j^lsvxl, x2 sts^iild 2 
< 9 /ld< 975 )< 95 ( 6 ) 3 <*<* f 7 <s 3 r scatter plot ssrrmgj. 
^su/DirSIssr /b(5)si5)su ^s^iLcrB^ierrm GarrQ&.rrm 
projeCtion-<95<95/7<S3r $£lG$)& ^(S^LD. £|) <5 £ 5 ) <S31<5F<STXU 
(Srs^ddlCSiu ff,rjs^3y<sb ^ssxssT^gjLD Q&mtr)] spQrj 
u^wmsmiJ) Qarrsmi—prra LDrrrDtDUuQ&imrDm. 
^sijsunrCSp susvuLipLb ssbsn ui—£$<si> xl, x2, 
x3 i oTsniLD 3 tSmb&rhiatQTfdainm ^rjsijansb ssrrmgj. 

^GUfDffilfDamssr projection area-^sw^y 2 
ufilLimGmrhiastnmd Qarnsm® si/<s3i/7L/(_<5^)su 

arrsmuuQGijgj Gurrsirirn 
<9)Gtni—iumsrruu®££Lju($)&>)iB3)l. ^pjl^iLi<sh<sn 

< 5 rjsi^aysb ^ ssi ssr^gj ld ^uuuuums^ Qamsmi— 
u(§S$d(§m Q&mgu 2 u^LDrrsmdiash Glanrsmi— 
Qsudt—fjnai LDrTjDjDuuQdlstriDGxr. 



—7 

X, 


31 ) 



nne g_^ 

CamScanner 


\D 


21.2 Projection Error 

(HLD/Dasmi— ^rjsssrQ\ ui—iEiaGtflsviLD ^rjsij&sb 

<shgs)gu ^ss)i£>^§]sbsn ^]/_<s^)/d0ld, project 

Q&ILHUUUlLi _ ^)L<S^)/I)0LD1<53r ^)®r»l_(o)sUS7f)(Suj 

projection error srmQi ^ss)LpdauuQ\Stp)§]. 2d- 
go ld-^as lc rrrr>trysupirjarresr ui—^ssigu 
urrrjd(ff)Li>(l>urr&)i s-imaa^d^ linear regression 
$G<nmGijd(& ) surjscmb. ^mrnsb PCA isrmu^i 
linear regression ^svsv. ^rQm&sflsv /5®si5)ffli) 










s-sbsrr tSU&Qarr® prediction-< 950 /j uiusirui—nr^i. 
QsurruLD projection -<*0 ldlIQGld 
uiusiruQQiDgi. ^susunCSjo ^dQanLii^ssxsm 
<sDSU<s^/Y ld G^uL/a sot sn <95 svsflpgid Q&msb sorrel. 
QsutruLDX w^uLiasvm^i—LDrrirjiDLb 
Q&iligi]&>!dQ3> ^dCHarr® uiumuQ\S\p)§]. 
Qld^jld linear regression- ®5 sum of squares 
error (orsirugj gnp^es)#, 

Q&[h](&)£g>rT3id asmddU^dljD^i. ssrrrsb PCA-su 
projection error srmu&ii udaisumLi^sb 
susmddh—uut^dljD^i. LSlsirsuQ^LDrrgu. 






'UVv&ojr 

Scanned with 
CamScanner 




?C(K 

Lfro^cJdo 


fcnrol^ 


21.3 Compressed 
components 

tSysnsy Glamsmi— uflLDrrsmiEiasn toTGUGungu 

ffirfiku ^snaSlsb *0<®<g5ijLJ®ffl/D^/, < 3 t@>)<sv ssbsn 
ui^ansb sjmQmmm sjssiiq] i3ssisu(r^LDniQ] 


umjaasvrtLD. 













1 . (Lpa,eSleb a,ijeija<sfT ^essm^ajib feature scaling 
QaiLnuuuu (Ssusm(^Lb. ^gjGseu data 
preprocessing ereirgu ^es)[pdauuQ\Stp)§].{xl, 
x2, x3...xn) 

2 . features-<*®<sDi_(Sujuj/T®sr js/jsiyasvf 
CTsi/su/T/py ^3]essL£i[ba# err err an sTmu6S)&>& arrets sr 

covariance matrix s-<r^eurTdauu(S)S)iD^i. 
^j^njarretsr eumuuurT® i3esreu(r^LDrrtp]. ^giGeu 

sigma sreZrtr}] <syets)[passuu($\&\rD§i. 

covariance matrix / sigma = ( 1 /m).summation 
of(l to m)[x . transpose of x] 

g)/ 5 <$ ^etstsflujrretsr^] symmetric positive definite 
eretssi ld uemi_i Qairem^ushm^rr eretnu urrrjda 
(SeuemQLD. ^uGurTajg.rrm t^essa, etsseu^aj 
projection-aaasw Qeuduessrj a_ 0 su rrda 

(LpipiL/Lb. 



3 . svd() ^Gvsvgi eig() si an ill, function-®»/j 
uiusiruQp^l projection-.*.* /rasr Qsndissirj 
a_0suf7<®<g5su/7t£). ^)<sdsu (LpsinpCSiu single value 
decomposition snw/p/LD, eigenvector sjm^jLb 

^S!S)Lf>dailjuQ\Lb. L5)<S5T6U0LDf77p/ 

[u,s,v] = svd(sigma) 

$]£>! 3 ^/<sOTl?«<sjnsyr a_0svfr<®0LD. u srmu&rf&.rrm 
projection -d&rrm tsygmsugi ul, u2, 
U 3 ...un S17<S31/7^)0<950LD. <s 61 (§[B^I /5LD.S0 

(Heusmisj-iLi ^su si/ features-®»<* (3<5/jsiy 
Q&iLiiLKsvrrLb. ^pnfsijgj ul, u 2 , u 3 ...uk - k 

<5tsstu§i <oT<sii6usnsi] principle components 

toTmu<5S)&>& (^IT^dSllD^I. ^)^/rj<95/7(537 G1JITILIUUIT® 
i3msu(f^LLrT!Q]. 

principle components = transpose of (u[:, 
l:k]).x 



4. ^Qppprra CTsi/suOTsiy principle components 

£D0/5<5/76U <5<95SU<SU ^LfiULI STg/SiyLD ^)0<95<5f7g/ 
STSOTL/S31<5<95 ^SOTT® i_5)uj_<95<95 (Ssu 6m® LD. ^)S31<5<95 

<ssw(5)i_5)t<j.<5^/<95 <5l/7i/su(S< 5 variance sissitry 
<$i6ts)Lf>&3>uu($\Lb. Slurry surra 99% variance 
^srrsfilsv ^j(r^d(^LDrrgu urrij^ajd Qarrsmi_rrsv 

rBsvsvgj. srsrrrCSsu k-®sr LD^Iusmud 
asssrQ\i3i^d(^Lb sumiiuuiTi—rTm^i i3ssrsu(r^LDiriQ] 
^SULDdljD^I. 


Average squared projection error / Total 
variance in the data >= 0.99 (ctsotGsu 99% 
cgysvrsiy variance-go <5<95<* smsu^ajd Qarrsrrdlin&ii) 

Where, 

Avg. squared projection error = 

(1/m).summation of(l to m). square of (x - 
projected x) 



Total variance in the data = (1/m).summation 
of(l to m). square of (x) 


k -sir LD$£lu®nu spsuQsijrrsirrDrra, 

( HLD/rjasmi _ su muuurrLLis\.sb QurT(fjj£$ 

GTudiurTgj ld^IuL j 0.99 g><s ^/rarer® ffl/Dji/ 

srsmu urrrfuugj spq^ sustn&. ^)^5/D0 uGdsvrra 
Svd()-u®s)5)0/5^/ Qu /ry ®sirp S <s>isvsflsmu 
i3msn(f^LD sumuuurrLLLSj.<sv G) u rr(fjj£$z) k -sir 
ld^Iu&du Q^rji^iurrard arsssr(^\i3i^daiscrrLb. 
summation of(l to k) S[i,j] / summation of(l 
to m) S[i,j] >= 0.99 

^)si/si//r/D/r<35 <s >/®<s ^gysyrsiy Qarrsmi _ 

ufiLDnsmiEiam jsasusu ^LpuL/ srgjsyLb 
rBsmi—Qu/omrisv (§sr>for5£5 ^srrsfilsv 



22. Neural Networks 


LD&sriir,6S)i!3S)i_uj ^Lpssxsn OTsi/sufTTry apdl/D§1 

toTsiruGSip (Lp sot Gmnupiurrsd Qarnsm® 

a_0suf7<®<g5ULJL.(_(S<5 Neural network ^gjLD. 

( Lp^sSlsv (^LfirBstn^iunau i3p)d(^LbQun§] LDGsfls, 
ZLpG$)Grr&(&) spmiQjQLD Q^fhun^i. i3mmp 

<sp(i) opemGrr /s/jldl/ (rfl g^prrsir) l i^Iuj 

®S}sq,uj£G$)&>& ajDgud Q a rr sherry Q-5 /tl_ iii 0 difosj. 

^sh(^} 3)9,9,ns, LDrf)QrQ[Tqjj /e/rroL/ stjdQ^&stQgu 
a/D/rijd QansmQmm GfilGtyiugGiprT® Qsfjd,^ 

|§) mQmrrqj) l/S^ili siSsi^ujjSsaijSuyLD arotryd 

Qamsrr&ljDSil. ^gijguitCSio usvCSsug^ /s/RDL/assrr 
GUG$)G0Ln3mmGb suLpstflsv spsirCSjonQi—msirgu 
i3<s$)Gts$ida,uuLL($\ Qpni—ijddUurTa ugvQgui^j 
L- l9 j]LIL- l9 j] ®S}Gq,UJth]9,G$)<5Hd 9iJD3i]d QairsmCSl— 



6U0®<s377D<sw. £|)<S310 ^isf-uustni—iurra <53>si700/ 

a_ 0 suf 7 <*<g 5 LyLJL./_(S ; 5 Neural Network ^, 070 . 


£|)0/ 6£ 61/(o) 617/70 6l5)62$/(7706310 ITyTD 617631<957j(_/(/))06>) 

Gus$)3iuuajD&jDgil- srssrQsn £|)0637 
(^^^trjLD classification problem-®» s^<s^) 0 <* 0 ld. 
binary classification -sb xl, x 2 snwp 
g)/jr< 5 w@features-g) 0 <sffl/DG)^<s 3 f)su, logistic - 
^6370/ sSygemetsT Q^tji^iurrai GTQ^&jd Qarnsm® 
h(x) -g> < 35 s ot/?<* 0 ld. sH^mnsb neural network- 
^<5370/ raw features-g>L/ uujmuQ\^ru£>sb 
&>m&Qa>m spq^ hidden layer-®» a_0si7/7<*®<* 

(o)afrssirtj), <^v,@si) Lfsu activation units-®» 

a_0SUf7<5l5ffl<® <95<S33H<5I5ffl/D0/- £§)0/D<9557(537 (g0^)/77D 
(_5) <5376170 77) fTgl/. 




— y{0() a O +^ia 1 +^2 <2 2 ) 

Where 

a Q = fj{ 0 ()XQ -j- 0 \ £1+ O2 X '2 ) 

Likewise CL^ & CL<2 


activation unit -sot LD^njurrm§] 0 (tp<s sb 1 su&svj 
^6$)LDGu&>rT<sb, sigmoid function- d^srr 
^ayspisv^iu parameters ldidtuld features 
^stnLD&lrDgi. £§)sot <5 ®r>6u<5(3<5 (Lps,sb activation 
unit-SOT LD^IULI &Gmd&h—UU®&>)[D3)l. £1)61/617/7(32) 
9 61 / ( 0 ) 61 / /T 0 activation u n i t- sot ld^iuli^i^l! 
asmdSii—uut^Sisirpsm. Parameters-®!) |sl_z_/t 
sTSOTgi/ (3)rold,G 3 , 171 !, ^GvsvGurr, Neural networks- 
SlL) £|)6OT61) weights STSOTff)/ 

c gySOTtp<®<g5L)/_/(5)fflsOT/D6OT.67SOT(361) <956OT/_£)u_//7<95 
«S3sf?<S<95L)/_/(5)l£) h(x) ID£ 5 )L)L/<95617, ^TSSOT/SOTLIL/ 




activation units ldjd^jld weights g> ^estessr^ 
sigmoid function-^su 3,&ssfld3iLju(^Sl<5irp&sT. 


22 .1 Neural Network <3 ]<s&>lduli 

<95<s3sf?L}z_/<9S0<5 (%&>Gts)<suujrrm features -sir 

toTsmsttfldstna LSlasijLb ^ /£5)<* ld /7<® 
^j(§d(^Li>(Surr^i logistic-.* gj/j u$dhsvrra rsmb 
neural networks-goB uiussiu^^somh. 

Binary classification-.*.*^^ neural network 
L5)fflSTSU0LD/Tffi/ ^SS)LDlLjLb. 



Multi-class classification-<®<s/7®sr neural 
network i3mGU(rjj LD/rgy ^s^ildil/ld. 













^ststflas/rlsir Quq^day^jd^ gjstnsmLiiHiLiLb 
snss)3iuSlsb (H&rjdauuQLbxO, aO ld^Iuli 3><sh bias 
units srmp)ss)Lf>daiuu<^\S}mp)m. 


Input layer: ^Lp<sv ^Lb&iEiam (Lppevmsijgj 
.syQddUsv arTG$ssnju($\Lb. 

Output layer: &6&sfld&uu®Lb asvsfluLiam 
astni—d) .syQddUsv ^s^ldilild. 

Hidden layer / Activation layer - g)<sy>z_u5)@i> 

usvQsugu LD6r>/D(Lp& ^®d(§am ansmuuQ\Lb. 

(Lppsv LDG5)fD(Lp3> ^(^)dd}<SV £Lp<5V 
tSULD&rhl&GSXSIT SDSUjS^/ 

a_0 surrda,u ulLl- Qaiusbu^i^ajib ^ec^ansb 
(activation units) arrGmuuQLb. 

LDG5)fD(Lp3> ^Qddhsi) ($}£&,($}£&> 
Q&iLKsvuQpgiLD <3iG0(&)3i<sb arTGmuu($\Lb. 



22.2 h(x) aessfluLi&m 

aSlpib 

SLpdasmL- ui—£$6v spsijQeurTQT) 
^GV($a(6rF,d(&)LDrrm 6T®rx_<*6yr 
Q&rrQdanjuLLQsbsfTm. ^guidgwid sigmoid 
(&^gdi)rT&d>l<sv QurT(j])£$dl spsuQsurr(§ 

c gl/SU0<g50LD/7<53rh(x) LD$dlULj 

aernddli—UU ($)&>) jd&iI. 

GTstni—Sisfilm LD$£iui3<5$)mij Qunruggj 

^Gtnsuasiflsir ld^Iuli AND, OR, NOT (SurrGirjo 

GiSl^lSHGuflGW ULS). ^GTSUDlLjLD. 






ST®<s^/«<g5/7Ll(5)<g50 -30, 20, 20 sign/A 

ld$ u l/<95 ssxsn g(z) Qun~(§^^lu 

unijdaisyib. xl, x2 LD^IULiaish 0,0 ^<95 
g)0/5<5/7<a> srmmsuqj) ld? 0,1 ^<95 £§)0/5;5/raj 
simmsuqijLD? 1,0 ldjd^jld 1,1 LD^IUL/an^d^ 
srmm su0ld? (Suitsstjossisu 
aismddl z_ uu ($)&>) id&iI. 



AND: 0,0 simndQuriff,! g(z) ld^)ul/ -30 otsot 
st^IijldssijduSIsv ^ssildSIjd^i. (SmjDaismL- 









sigmoid su&nrjui—^^isb -30 srmu^i 0 

OT<5377J<S31<5<95 0 rt 5 )<* 0 LD. ^jsijsuirCSjD 
LD^ULiam aGmd&)i—uu®§>)GST!DGxr. £D< 5 ;d<s/ 7 <s 37 

^Liisus^ism AND -<®<g5/7<53rtruth table-«» 

S^<*^)0/jLJ<S31<5<95 <S/7<S337(oL>/7LD. ^<5/7(519^/ X0 LD/DQJLD 

Xl l-cg,* <SHS<S)LD[5&>fT<sb LD/_!_(/)) <3 LD h(x)=l «» 

(o) 619 (Syfiij/_/(/)) <5 65 /LD. 


Weights = -30,20,20 

xl 

x2 

h(x) 

AND 

0 

Q 

h(x) = -30.x0 + 20.xl + 20.X2 
= -30 + 20.0 + 20.0 
= -30 

0 

0 

1 

h(x) = -30.xO + 20.xl + 20.X2 
= -30 + 20.0 + 20.1 
= -10 

0 

1 

0 

h(x) = -30.xO + 20.xl + 20.X2 
= -30 + 20.1 +20.0 
= -10 

0 

1 

1 

h(x) = -30.x0 + 20.xl + 20.x2 
= -30 + 20.1 +20.1 
= -30 + 40 = 10 

1 


OR: -10, 20, 20 OTffip/LO L£>|5)L9L/<S<S31677 g(z) 

(^<5^)/7^^)si) (fl)7J/70<5^)LL U fTfjdSy <ol] LD. £|)<5,/D <95/7(537 













OR - <*<* nr&sr truth table-go 
ep^^l(§uus^)^d ansmscmb. ^^,rrsu§i xO ldjdqjld 
xl 1-^35 <SHGt5)LD[5&>rT<sb LD /_!_(/))(3 LD h(x)=l go 
Qglls/lIuuld. ^prrsijgj xO ^svsv^i xl 
^ijsssru\.sb 6j&,nGi]3 ) i s^&sr/p/1-^35 ^sold^ltso 
<Si.(_h(x)=l go QsUSufluuQl^^lLD. 


Weights = -10,20,20 

Xl 

X2 

N«) 

OR 

0 

0 

h(x) = -10.xO + 20.xl + 20. x2 
= -10 + 20.0 + 20.0 
= -10 

0 

0 

1 

h(x) = -lO.xO + 20.xl + 20.X2 
= -10 + 20.0+20.1 
= -10 + 20 = 10 

1 

1 

0 

h(x) = -10.xO + 20.xl + 20.X2 
= -10 + 20.1 + 20.0 
= -10 +20 = 10 

1 

1 

1 

h(x) = -lO.xO + 20.xl + 20.X2 
= -10 + 20.1 +20.1 
= -10 + 40 = 30 

1 


NOT : ^[Tsmi—irsu^i ^QddUsv ssbsn 3-su^i 
tSyGvam&rrgj NOT xl AND NOT X2 Qp<s vld 
asmdd)i— liu($)&>)jd&iI. ^grrsijgj NOT xl LDfor^jLD 















NOT x2 ^ftsstsruj.m ld^ul/ld AND -^Lpectb 
lSssstQld <9,smdSU—uu(^S}sirjDs^T. £§)<5/rj<sfrasr 
OT651/_<9S6)T 10, -20 <oTmQ] ^StHDlL/LD. 


Weig 

,hts = 10,-20 

xl 

x2 

h(xl) 

NOT(Xl) 

m«2) 

NOT(x2) 

NOT xl AND NOT 

0 

0 

h(x) = 10.xO -20.xl 
= 10 

1 

h(x) = 10.xO -20.X2 
= 10 

1 

0 

0 

1 

h(x) = lO.xO -20.xl 
= 10 

1 

h(x) = lO.xO -20.X2 
= -10 

0 

0 

1 

0 

h(x) = 10.xO -20.xl 
= -10 

0 

h(x) = 10.xO -20.x2 
= 10 

1 

0 

1 

1 

h(x) = 10.xO -20.xl 
= -10 

0 

h(x) = 10.xO -20.x2 
= -10 

0 

1 


otsotGsu (^jss)suiF,m Sf>ssrpna Gt&rjrBgj (SLDjo^smi _ 

neural network-aa/rsw ld^Iul/ i3msu(r^LDrfj^] 

^S^lLDlLjLD. 


xl 

x2 

al 

a2 

h(x) 

0 

0 

0 

1 

1 

0 

1 

0 

0 

0 

1 

u 

J 

U 

0 

1 

1 

1 

0 

1 































22.3 Forward propagation 


Layer 1 : 

a = 



II 

<13 


II 

« 

«o 

*1 

Layer 2 : 

a = 


-*2- 

e = [9 0 


®sl a = 

ao 

a i 

Layer 3: 

M* 

= S(0. a) 

. &2 - 


( Lp^svnrsu^i ^Qddlsv ssbsrr Q&iusvuQg&tjLD 
^ecanm^i (activation unit) ^_p<sv 
^Lb&migimnai (raw features) c g>/<s3i7Duy7D. ^)^/Gsu 

asbsffLiQdairTm ^0ld. 

^)/7<53sr/_/rsu<s/7<g5 sshmsy LD<s3i/D(7p<95 
$}$<sv s-sbsrr Q&ujGVuQtsgjLD ^Goarrm^i 
(Lp^svnrsu^lsv ssbsrr tSULb&mi&sb LDrorryLb 
OT<53i/_<95<53i syru(weights) Qurrguggj ^soldiuld. 

<95<531 (_£)(/_/1735 Ssbsn§] Q GU Glfl udld-® <95<9517(537 <S>]®d(ff) 
^(S^LD. ^)^)<si) a_S77(5y7 = ^(Sl1<95f7<5370/ 7D(531/D(tp<95 







^Qd^as/rlsv 2-Grr<sn <3]<5V(§am LD/DguLD <sn&,m 

57531/_ <955316177j(weightS) Quntryd \{£] ^ <5317D IL/LD. 

)sijsunr(SjD spsuQsurr(§ ^QddlGgjLD asbsrr 
Q&ujtsvuQibgjLi) ^GV^as/rlGtr ld^Iulild 

^<95 531/531/_ 17/ 57(531/_ LLj LD (Sff/f/BJl/ 

^(^d^as/rlsv asbm ^<sv(^£ns/rlsir ld^Iugldu 

SkTLDmssfluuQ <5 forward propagation 

5753777/_/(/))77>. 


22.4 Back propagation 

rBLDgj neural network-si) asbsn s^ei/(0)517/70 

t3]6V($d($LD 57537 ( 0)537537537 STS31/_<95531(5170 
/_//L/537/_/(5)<5^)s37/7si), <5617/7i/<956316yr<95 0531/D<95<955l)/77D 

S7S37<95 <95S337(J)/_5)t(j./j/_/(S<5 back propagation 

^070 . 5£ 61/(o) 617/70 <S>t®dd)6yLL rfla(LpLD 

<56i/s3i/D<95 <955337 ®lStj.< 95<95 ^<5537 partial derivative 

TO® 77 7/<95 6)7 75)53753^0/5^/ (7p 537537/7<95<95 

.95 5337<95®/_ 77 7_/®ffl 5379)637. /5) 537537/7 ^531617 <95 531617 


spmtr)] ^trjL-i^. <s>jr 5 &> network-sw cost 
aismQ\i3i^dayUuQ\S\p)§ ) i. Qurr&jGurrai gradient 
descent algorithm -^m§] 0 <st>/d^ ^/sy/si/ 
cost Qsusfihjui—d 3in.LSj.iu susmaiuSlsb neuron- 
susuflsir loTGmi—Gmu ^s&LD&a, back 
propagation -®pzj uiusiru(^^§iSiio§i. 

delta = error of each node in the corresponding 
layer 

Layer 3 : delta3 = h(x) - y 

Layer 2 : delta2 = theeta T ,delta3 .* a .* 1-a 


Layer 1 : delta 1 = theeta T .delta2 .* a .* 1-a 



where g'(z) = a .* 1-a = This is g-prime. 
derivative of the activation function g 

. * = element-wise multiplication 


(Accumulator matrix ) 



23. Perceptron 


Perceptron <nmu <3<s neural networks-aa/rsw 

^sgiguuss)i _ epQTj QrBrjGarr® (LpsvLb LSJrflda 

eusvev fzrjGy&^d&rrGsr binary classification 
algorithm ^^ld. ^mrrsb ggjgi logistic 
regression Qurrmg y &>m§i agjg)6ir><sv 

tSHGOLDdanag. rffg^rrrrm 6TsijeurTigj Q&rrt^&Lb 
Qanrfoja'LjjfTV, &jDi)]d Qv.nsiidifoQgri ^sgg&nssi 

<sgLq-uu<5$)i—UjrT3i smsu^S)], uuSlgjd}^ ^rjsijaissxsmj 
u/Dfflu Uis)-UUL£)-Ujrrad agjggjd Q^rrmSlfogg. 

dL^dasmi— srQ\^§]dairTLLu\.sb 4 uuSli d£)<5 
<S/7siy<95sfr QarrQdanjuLLQsbstrm. ^^Isvxl, x2 
sTmiid 2 features-®*) ssisu0 ^gsvsv^i 1 stct/ld 



(S17(Sy)<$u5)(Sffr ff>Lp ^G^LDLLjLD U ll51JD ff) d 0 

2_<srrm6$T. 

xl, x2 , y 
[0.4 ,0.3,1], 

[ 0.6 , 0 . 8 , 1 ], 

[0.7 ,0.5,1], 

[0.9 ,0.2 ,0] 

Neural Networks srsiru^] GrB^i^iurra aroiryd 
QafTsrrsnnLDev ^)<st>/_u®su usv activation units-®» 
a_0svf7<s® <sn&,mLq-uuGS)i—u5leb airD/r^d 
Q&n<sbi§fT)Lb sissitry sjroQfhssiGsu unj^G^mi. 

£|)/5/(3jLD features-g>n /ld ^j&gn/aroz—iu weights- 
g> il/ld GrBijLSj-iurrs, hypothesis-g><® 

ai/Diyd QarrsfTSfTnLusb, ^Gtni—u51<sv activation 
unit-g><® asmddlQdliDgj. i3mmij ^ldld^IulSIsst 
'S msf-uustni—uSlGV £ijGi]&tgnjd(§ sjroronij (SumsirtQ] 

weights-g> usfTfoiol pfliunssr (Lpss)jouS}<sb atoned 



(o)<95/7syrffl/D^/. $j&ii i3sirGuqjjLenny, parameters 
si&siuGo, $}izi(3) weights srm 

<SH<5S)L£>&3ilju($\&\ 3)3)1 ■ 

https://gist.github.com/nithvadurai87/ 

e6794ec008a7855681db4ba9164b54af 


def predict(row, weights): 
activation = weights[0] 
for i in range(len(row)-1): 
activation += weights[i + 
1] * row[i] 

return 1.0 if activation > 

0.0 else 0.0 

def train_weights(dataset, 

- Poch): 








weights = [0.0 for i in 
range(len(dataset[0]))] 

for epoch in range(n_epoch) 
sum_error =0.0 
for row in dataset: 
error = row[-l] - 
predict(row, weights) 

sum_error += error* 
weights[0] = 

weights[0] + l_rate * error 
for i in 

range(len(row)-1): 

weights[i + 1] 

weights[i + 1] + l_rate * error 
row[i] 

print( 1 epoch=%d, error= 
%.2f 1 % (epoch, sum_error)) 
print (weights) 

dataset = [[0.4,0.3,1], 



[ 0 . 7 , 0 . 5 , 1 ], 

[ 0 . 9 , 0 . 2 , 0 ]] 

l_rate =0.1 
n_epoch = 6 

train_weights(dataset, l_rate, 
n_epoch) 

. -. . .. ..... 

iff rrspids near Qeueiflid®: 

epoch=0, error=2.00 
epoch=l, error=2.00 
epoch=2, error=2.00 
epoch=3, error=2.00 
epoch=4, error=1.00 
epoch=5, error=0.00 
[0.1, -0.16, 0.06999999999999998] 


snoSsrd&Q&err rff&qgil oSI&ld: 





( Lp^sSlsv QarrQdauuidQsrrsrr features-siy/_<sw 
^snsmdauui— (Ssusmnj.iu weights-sir 
LDtdluurra 0, 0, 0 srsiiuss)#, s<s)si]d,&j 
&rDtDGS)Go£ Q£nri—[Ei(§d)iD3)l. (tp<sa5)su ssrrsrr 
Lj®D®£huLb, x0 a/gn/ti. bias unit-<s<s/7&sr 
LD$£iuurr(&)Lb. £|)/5<9> bias unit sruCSurr^iLb 1 

ST OS) ILC) LD^IUSDuQuj Qu/Dffl(fTjd(ff)LD srssr 
ejroQaymQsij urrrj£G^mi. ssrrsrr 

Lj®T>®£hurhiaGrrxl, x2 -darrrssr weights 
LD^iuurr^LD. £|)6u jDstn/D s<s)sud,§j idssrsuq^rb 

SU1TLLJ LlU IT lLl^- sir QPSVLD (Lp^SV <5/7Siy<*<*/T(5OT 

[0.4 ,0.3 ,1] activation unit 
asmddh—uuQdl/Dgi. ^qiGIsli heaviside 
activation function srsir^i ^ss)Lpd3iuuQ\Sip)§]. 
sigmoid (Surrsirgu LDroQprrq^ snstnar. 


Activation unit 1 = wO.xO + wl.xl + w2.x2 



= 0(1) + 0(0.4) + 0(0.3) 

= 0 


if Activation_unit > 0, Predict 1 
else Predict 0. 

aismi—tSlgg, ld^Iul/, 0-®d giS)l_ 

cgi/^)<g5LD/7<g5 £D0/5<5/7ffli> 1 STSSISl^LD, 

^jebsmeoQiussFleb 0 CTsarsiy ld predict Q^iuiL/Li). 
^)/5j0 0 otsot predict Q&iuiLiLb. ^ssmsb uuSlpdlg 
grjaSlsb 1 ct <537 QarrQdauuLlQmmgj. ^jsijsurrgu 
uu5]ro§\&> &,rj®S}<sb ssrrsn activation unit 

astfsflgg LD^IUL/isir spggju 

G>unf&6filGV®n6vQuj®sflGv(l != 0) weights-sar 

LD^iui3s<s)m LDrrpjfil grj£i]d(§ uuSlp^ 

^srflda (SsnswfJlLD. i3msu(gLD sv miluumLigm 
g tpsvLb l/SDuj weights asmd&>U—Lju($)§>)fDgi. 



wO = wO + learning_rate * (actual-predict) * 
xO 


spshQsurrq^ weight-LD g,sa/gn/sy>f_tu 
u&DLpiu ld^)l}l/(_<sw learning rate-g><® 

^)/5<55 learning rate srmu^i 
gradient descent-si) rsmb uiumuQggjdlmfr) 
LD^iui3s<s)m sp^3)Q{Tj ^(SFfLD. ^grrsugi update-<sw 
^snsu nm&j ^)/6^ learning rate op sure 
&LL($\uu($\£&,uu($\&\rD& ) i. ^)<s<53r ld^ul/ 0.1 sr&sr 
stnGudauuLLQmmgi. ^^rrsugi LSlad&rfihu 
tSymstflsv £D<s6p®n/_uj weights, adjust 

Q&IL1UJUUL- G>Gll6m®LD 6T G$t LI G5) &>Q UJ 
(gdUddliDgi. i3mmij <^da^LL($\£ Q^rrstnaiLiisir 
sssstss)LDiurrm LD^iund^LD - as^sfluL/d^LD 
ssben Qsnir)]urrLLu\.m lo^ul/ld, weights 
^svsmdauuL-Qmm features-sw ldSDul/ld 
Qu(rj)d&>iju($\&\rD&j. ^)si/su ™)rra l/SDuj weight¬ 
ier ld^Iuli aGmddh—uuQdlingi. 



£|)/5<5 su mu uu rr lL&s) i —lj uuj mu®£$£ld 
asmd&U—uuLLL- weightS-sw LD^ULiam 
i3msu(f^LC>rT!Q]. 

wO = 0 + 0.1*1*1 = 0.10 
wl = 0 + 0.1 * 1 * 0.4 = 0.04 
w2 = 0 + 0.1 * 1 * 0.3 = 0.03 

£g)<5<5<531<95UJ Ll^llU weights-®ptl uiusiru®£$di 2- 

Gug] prjGyd&mssr \ 0.6 ,0.8 ,1] activation unit 
lS&stsu (VjLDrrrru susmddU—uut^dljD^i. 

Activation_unit_2 = wO.xO + wl.xl + w2.x2 
= 0.1(1) + 0.04(0.6) + 0.03(0.8) 

= 0.1 + 0.024 + 0.024 
= 0.148 

£g)/E/(3j 0-®p G1 5)l_ cgi/^)<95LDf7<g5 0 UU&,msi) 1 CTS37 

predict Q&ilulild. uuSlpdl^ ^rrsfilsgiLD 1 ctsot 
s-shm^i. ^dsGsiv weightS-®» lditidjd^ldgv 3-guqi 
prjfsyd&mssr [0.7 ,0.5 ,1] activation unit 



<95 smd ® l_ u u @ dljngl . 

Activation_unit_3 = wO.xO + wl.xl + w2.x2 
= 0.1(1) + 0.04(0.7) + 0.03(0.5) 

= 0.1 + 0.028 + 0.015 
= 0.143 

^)/5J0ld 1 <oTssi predict Q^ujQjd^i. uuSJjd £)<s 
j)f 7 £u?ay/fX' 1 srm ssrrmgi. ^tf,Gsu weights-go 
LDrrjDjDrrLDsv 4 -gu§j g,rj<sydairTm [0.9 ,0.2 ,0] 
activation unit aGmddli—uuQdlfD#)]. 
Activation_unit_4 = wO.xO + wl.xl + w2.x2 
= 0.1(1) + 0.04(0.9) + 0.03(0.2) 

= 0.1 + 0.036 + 0.006 
= 0.142 


wO = 0.1 + 0.1 *-l* 1 = 0.0 
wl = 0.04 + 0.1 * -1 * 0.9 = -0.05 
w2 = 0.03 + 0.1 * -1 * 0.2 = 0.01 



5J0 1 otsw aGvsfld&ljDgj. ^mrr<sv asms^iLDuSlsv 
0 CTS3T 3-<sbsfr§]. srmQsu lBssstQld weights 
asmd&h— liu($)&>) jog]. ^GUGurrpnai 
QaurQdaiuuLLQsbsrr 4 uuSlpdlp ppsyasuflsv 2 

gfliurTa 3ySSsf}dauuL2Q\sbsn§], 2 psupna 

aGtfsfldauuLLQmmgi. ^p&jtsir (Lppsv epoch 

(LpLSj-dl/Dgj. <SHff> nsugtf ep(jj) SrfdfOlsv ^esxssip^iu 

uuSlpdlp < 5/7617 <*(613 id (Sldjd^s^ti— Q&rT&>G$)md(& ) 

s-lLu^^uulLQ), algorithm appjd 
Q&rT<sbGi]s$)& > Quj 1 epoch tsjm&lGrDrTLD. 
i3m<su(rjj Lenny. 


Epoch = 0 

xl 

X2 

y 

weights 

activation units 

predicted_y 

updated_weight for next r 
Ify != predicted_y 

0.4 

0.3 

i 

0,0.0 

o° 

9 

9 


0 + 0.1 * 1 = 0.10 

0 + 0.1 * 1 * 0.4 = 0.04 

0 + 0.1 * 1 * 0.3 = 0.03 

0.6 

0.8 

i 

0.1, 0.04, 0.03 

0.1*1 + 0.04*0.6 + 0.03*0.8 
= 0.1 + 0.024 + 0.024 
= 0.148 

1 


0.7 

0.5 

i 

0.1, 0.04, 0.03 

0.1*1 + 0.04*0.7 + 0.03*0.5 
= 0.1 + 0.028 + 0.015 
= 0.143 

1 


0.9 

0.2 

0 

0.1, 0.04, 0.03 

0.1*1 + 0.04*0.9 + 0.03*0.2 
= 0.1 + 0.036 + 0.006 
= 0.142 

l 

0.1+ 0.1 *-1 = 0.0 

0.04 + 0.1 * -1 * 0.9 = -0.05 
0.03 + 0.1 * -1 * 0.2 = 0.01 



















(tp< 5 si) epoch -sir astni—ffluSlisv l/£5)<5/7« 
aemdSh—LjuLLi— LD^IULiaCSm epoch-sir 

UuSl/D&lp <5/761/<95(S)Ji_sir (H&rjpgl 

uiusiruQ^iBLjuQSlfDgii. ^jsijsunr/orra 6 (tpsai/o 
epochs asmdSU—uut^SljD^i- 
L5)®ST6U0LD/Tffi/. 








updated weight for next 

If y != predicted y 

xl 

X2 

y 

weights 

activation units 

predicted_y 

0.4 

0.3 

1 

0, -0.05,0.01 

0*1 + -0.05*0.4 + 0.01*0.3 
= 0 + -0.02 + 0.003 
= -0.017 

0 

0 + 0.1 * 1 = 0.10 

-0.05 + 0.1 * 1 * 0.4 = -0.01 

0.01 + 0.1 * 1 * 0.3 = 0.04 

0.6 

0.8 

1 

0.1, -0.01, 0.04 

0.1*1 + -0.01*0.6 + 0.04*0.8 
= 0.1 + -0.006 + 0.032 
= 0.126 

l 


0.7 

0.5 

1 

0.1, -0.01, 0.04 

0.1*1 + -0.01*0.7 + 0.04*0.5 
= 0.1 + -0.07 + 0.02 
= 0.1 

l 


0.9 

0.2 

0 

0.1, -0.01, 0.04 

0.1*1 + -0.01*0.9 + 0.04*0.2 
= 0.1 + -0.009 + 0.008 
= 0.1 

l 

0.1+ 0.1 *-1 = 0.0 
-0.01 + 0.1 * -1 * 0.9 = -0.1 
0.04 + 0.1 * -1 * 0.2 = 0.02 







updated weight for next 

If y != predicted y 

Xl 

x2 

y 

weights 

activation units 

predicted_y 

0.4 

0.3 

i 

0, -0.1, 0.02 

0*1 + -0.1*0.4 + 0.02*0.3 
= 0 + -0.04 + 0.006 
= -0.03 

0 

0 + 0.1 * 1 = 0.10 

-0.1 + 0.1 * 1 * 0.4 = -0.06 

0.02 + 0.1 * 1 * 0.3 = 0.05 

0.6 

0.8 

i 

0.1, -0.06, 0.05 

0.1*1 + -0.06*0.6 + 0.05*0.8 
= 0.1 + -0.036 + 0.04 
= 0.104 

l 


0.7 

0.5 

i 

0.1, -0.06, 0.05 

0.1*1 + -0.06*0.7 + 0.05*0.5 
= 0.1 + -0.042 + 0.025 
= 0.083 

l 


0.9 

0.2 

0 

0.1, -0.06, 0.05 

0.1*1 + -0.06*0.9 + 0.05*0.2 
= 0.1 + -0.054 + 0.01 
= 0.056 

l 

0.1 + 0.1 * -1 = 0.0 
-0.06 + 0.1 *-l* 0.9 = -0.1E 
0.05 + 0.1 ’ -1 * 0.2 = 0.03 



























updated weight for next 

If v != predicted v 

xl 

x2 

y 

weights 

activation units 

predicted_y 

0.4 

0.3 

1 

0. -0.15, 0.03 

0*1 + -0.15*0.4 + 0.03*0.3 
= 0 + -0.06 + 0.009 
= -0.051 

0 

0 + 01-1 = 0.10 

-0.15 + 0.1 * 1 * 0.4 = -0.11 

0.03 + 0.1 * 1 * 0.3 = 0.06 

0.6 

0.8 

1 

0.1, -0.11, 0.06 

0.1*1 + -0.11*0.6 + 0.06*0.8 
= 0.1 + -0.066 + 0.048 
= 0.082 

1 


0.7 

0.5 

1 

0.1, -0.11, 0.06 

0.1*1 + -0.11*0.7 + 0.06*0.5 
= 0.1 + -0.077 + 0.03 
= 0.053 

1 


0.9 

0.2 

0 

0.1, -0.11, 0.06 

0.1*1 + -0.11*0.9 + 0.06*0.2 
= 0.1 + -0.099 + 0.012 
= 0.013 

1 

0.1 + 0.1 * -1 = 0.0 
-0.11 + 0.1 * -1 * 0.9 = -0.2 
0.06+ 0.1 *-1*0.2 = 0.04 







updated weight for next 

If y != predicted y 

xl 

x2 

y 

weights 

activation units 

predicted_y 

0.4 

0.3 

i 

0, -0.2, 0.04 

0*1 + -0.2*0.4 + 0.04*0.3 
= 0 + -0.08 + 0.012 
= -0.068 

0 

0 + 0.1*1=0.10 

-0.2 + 0.1 * 1 * 0.4 = -0.16 

0.04+0.1*1*0.3 = 0.07 

0.6 

0.8 

i 

0.1, -0.16, 0.07 

0.1*1 + -0.16*0.6 + 0.07*0.8 
= 0.1 + -0.096 + 0.056 
= 0.06 

1 


0.7 

0.5 

i 

0.1, -0.16, 0.07 

0.1*1 + -0.16*0.7 + 0.07*0.5 
= 0.1 + -0.112 + 0.035 
= 0.023 

1 


0.9 

0.2 

0 

0.1, -0.16, 0.07 

0.1*1 + -0.16*0.9 + 0.07*0.2 
= 0.1 + -0.144 + 0.014 
= -0.03 

0 









updated weight for next 

If y != predicted y 

xl 

X2 

y 

weights 

activation units 

predicted_y 

0.4 

0.3 

1 

0.1, -0.16, 0.07 

0.1*1 + -0.16*0.4 + 0.07*0.3 
= 0.1 + -0.064 + 0.021 
= 0.057 

1 


0.6 

0.8 

1 

0.1, -0.16, 0.07 

0.1*1 + -0.16*0.6 + 0.07*0.8 
= 0.1 + -0.096 + 0.056 
= 0.06 

1 


0.7 

0.5 

1 

0.1, -0.16, 0.07 

0.1*1 + -0.16*0.7 + 0.07*0.5 
= 0.1 + -0.112 + 0.035 
= 0.023 

1 


0.9 

0.2 

0 

0.1, -0.16, 0.07 

0.1*1 + -0.16*0.9 + 0.07*0.2 
= 0.1 + -0.144 + 0.014 
= -0.03 

0 



6-sujy epoch-si) &>rrm, uuSlpdl^ 

&,rjGy3,t§H)Lb pfiiLma asmfldauuQ&lGvrfDSvr. 






































srmQsu ^asniani—iLi weightS-go<3(u iSlroanGop 

<5 /TSiy <*<531SOT <95 Sttfluu^p <* OTSOT algorithm- SOT 
weights-^* ffjfTLD srQ^&jd Q^rrmm&jnLD. 
^GUGurrtDrrai (tp<5®i> <5 /tsi5)sot asvsfluLi &3hurTa 
^)0^<5/70L), <5/7S25)/7)0<? Q&GVGVILD. 

^<5V6tn<SvQlUGtsfl<5V WeightS-®E> h)S3OT(J)LD 
<35(53OT<95®L_(/)) <S/7Sl5)/D0<j Q&6VGVILD. 

aGvsfluLi&m tSystnmpgjLb pfliurra rflaQgLD sotsot/j 

^)(5<s (tpsOT/o i3murf)!Diju($\Gi]&,rTeb, error- 
driven learning algorithm sissr/p/ 

<S>]6tnLgdaLju®&>)lD3)l. ^)<SSOTt!J.L}LJSOT(_U®SU 

^sotldSsot® MLP (Multiple Linear Perceptron) 
sTSOTu(S <5 neural networks-®» a_ 0 su/r<® 0 ffl/D^/. 



24. Artificial Neural 
Networks 


s£0 r§lu^pnm arorrijd QamsrrsiJGtnp 
<3jL£f-UUGtni—UJiTa svxsupgi £BjDgijd Qairmsugi 

perceptron stsot/d/tsu, usvQsuq] ^^rjnmaissxsnd 

Qamsmi— LDsvflp pyj&sxsn v,forryd Qlanerreuss)^ 
^i^uuss)i—UjrTai ssisudj^/ .s/D/p/a Qanvsbsn§] 

Multi-layer perceptron ^^nsu§i 

Q&iuebaiGtnsrr ^i^uus^i—iurraid Qamsm® 

airD&lsirrDm. r§uj ) iji7mif,dr ^forryd 

Qamsmi—strip ®ns vppi LD&sfiS, (^pstnsrr ajD&H/D&tl. 
£DG><s (Lpstn/DuSlsv ptjsij&stnm ^i^uussii—UjrTaid 

Qarrsm® perceptron a/rjfflsw/Dsw. Perceptron- 
s,ss)sn &nsuppi directed acyclic graph-go 
2_06nf7<®® MLP aroSlropi. $]§] <3 su Artificial 
neural network srsirgu ^stnipdanju^Sippi. 



Perceptron GTsiruaji (UrsrjCHarT® ftp gold 

LSlfldad 3 in.iij.uj a,rjsijas<s)<sn suss)auu(f\^a, 2_<56iy ld 

sissiiry sjroQamQsu urrrjd,Qa,mi. Non-linear 

(Lps&fDuSlsb ^ 3 jss)i£>f§]<sbsfT^,[jsijass)smj 

L 3 fiju&>!D( 3 ) MLP-®pzj u uj mu(f)^svmb. 
sTssiQsu^rrdr (^]§j universal function 
approximator srmip] ^s>jss)Lf>dauu<f\d\p)§i. 

<561% kernalization ots&t/d <5<s^/6u(tpLD non-linear 

(Lps&fDuSlsb ^GVLDfB&jisbisn &,[jGijaG$)6mj 

L 3 fuu&,rD(&)S-&>GyLb. ^js^^uujDjf SVM stsot/d 
u(&)§SiuSleb sjpQamQsn unij^aj si5)z_L(5i_rr/_6. 

dLfdasmi— ST<f\d,§idarrLLuj.sb, 16 l_/i %)£)<5 
<5/T6iy<*6rr Qarr(f\dauuLL<f\<shmm. X -sb 
cgi/jSOTjysaiLuj 2 features-ro, y-sv ^gysnsu ot/ 5 <s 

GusmauSlm dip iffda (Ssusm(f)Li 67 gn/LD 
sfsnrjQptb uuSlfDd\d( 3 ) ^jsfdauuLL(f\eh<snm. 

1,2,3 avgrwLD (ppstrru GUGmaasfm Srp a,rjsija<sb 



LSlfldauu^su^rnsv multi-class 

classification-<®<g5f7<s3ra_<s/T/7<53srLD ^,0 ld. 


https: //gist, github. com/nithvadurai8 7/ 

b95e0ccd56464646da32ffdddb8b457f 


from mlxtend.classifier import 
MultiLayerPerceptron as MLP 
from mlxtend.plotting import 
plot_decision_regions 
import matplotlib.pyplot as pit 
import numpy as np 

X = np.asarray([[6.1,1.4], 

[7.7.2.3] ,[6.3,2.4],[6.4,1.8], 
[ 6 . 2 , 1 . 8 ],[ 6 . 9 , 2 . 1 ], 

[6.7.2.4] ,[6.9,2.3],[5.8,1.9], 
[ 6 . 8 , 2 . 3 ],[ 6 . 7 , 2 . 5 ],[ 6 . 7 , 2 . 3 ], 





[6.3,1.9],[6.5,2.1 ],[6.2,2.3], 
[5.9,1.8]] ) 


X = (X - X.mean(axis=0)) / 

X.std(axis=0) 

y = 

np.asarray([0,2,2,l,2,2,2,2,2,2,2 

, 2 , 2 , 2 , 2 , 2 ]) 

nn = 

MLP(hidden_layers=[50],12=0.00,11 
=0.0,epochs=150,eta=0.05, 
momentum=0.1,decrease_const=0.0, m 
inibatches=l,random_seed=l,print_ 
progress=3) 
nn = nn.fit(X, y) 

fig = plot_decision_regions(X=X, 
y=y, clf=nn, legend=2) 



print('Accuracy(epochs = 150): 

%. 2f%%' % (100 * nn.score(X, y))) 

nn.epochs = 250 

nn = nn.fit(X, y) 

fig = plot_decision_regions(X=X, 

y=y, clf=nn, legend=2) 

pit.title( 1 epochs = 250') 

pit.show() 

print('Accuracy(epochs = 250): 

%.2f%%' % (100 * nn.score(X, y))) 

pit.plot(range(len(nn.cost_)), 
nn.cost_) 

pit.title('Gradient Descent 
training (minibatches=l)') 
pit.xlabel( 1 Epochs' ) 
pit.ylabel('Cost') 
pit.show() 



nn = nn.fit(X, y) 

pit.plot(range(len(nn.cost_)), 

nn.cost_) 

pit.title('Stochastic Gradient 

Descent (minibatches=no. of 

training examples)') 

pit.ylabel('Cost 1 ) 

pit.xlabel( 1 Epochs' ) 

pit.show() 


uuSlpdl^ g,rjGij3>G<s)<srr& Qanrsm® MLP-<* 0 /j 

uuSlrrjdl ^s/rld^LbCHun&ti, L3]smsu(§Lcngu 

< 5 / 76 iy<s<s 3 i 6 vnj LSIfidSljo^i. <sv 1 otsti/lo 
GU6$)a>uSlm dLp LSlfldauui— (SGusmi^iu^i 

^/< 5 /D 0 rjhu ^i—£$d)<sv pfliunra ^GtsumunLcusv, 0 
GTStniLD GUG<S)3iuSlm!dL£> l3fldauU fJ.tiJ.0Ly/_/<SD<5<® 


&msm<5VITLD. 




otsotGsu MLP-<®0zj uuSljo^srfldtgLbCSurrgj 
QaurQdaiuuLLQsbsrr epochs-sw 
GJs$$istsrf]dG$)3iG$)Uj 15O-a5)0/6^/ 250 -srm LDrrrDjSl 
uuSljr)&) ^siflggiu urrijdaisi^LD. ^juQurr^i 
<sn<5$)m£§]£ ^rjsijaa^Lb gtfliuna 
GUG$)anju($\£&>ijuL^Lq-(n)LjuGtn&>d amsmsvrTLD. 





nn.epochs = 250 
nn = nn.fit(X, y) 

fig = plot_decision_regions(X=X, y=y, clf=nn, 
legend=2) 

plt.title('epochs = 250') 
plt.showQ 






srmQsu^rrm 150 srssrnbQurr^i accuracy 
93.75% CTswsiyLD, 250 srs^iLb(SurTa,i 
accuracy 100.00% gtssiguld s-uj firsts(njuus$)&,d 

arrsmsvrrLD. 

Iteration: 150/150 | Cost 0.06 | Elapsed: 
0:00:00 | ETA: 0:00:00Accuracy(epochs = 
150): 93.75% 

Iteration: 250/250 | Cost 0.04 | Elapsed: 
0:00:00 | ETA: 0:00:00Accuracy(epochs = 
250): 100.00% 

<s>K$\£&>& > rr 3 i spGuQsurTQi) epoch -s^ild ^^sir cost 

LD^IULI STSlJSUrriQ] (ff ) iSS)foS'ifO»J toTGZTUgJ 
<oU(oft)[JUL _ LD[T&> GlIGtniJrBgl &rTLLl—UU®&)lD§J. 

MLP-<*0l} uuSJjd £) ^aifld^LbCSurr^i 
Qarr® <*<95 u ulL (J) sbsrr parameter-* sifl<sv 
spsUrrorrm minibatches -sir ld^Iu l/ 1 srm 
tSH&nLDr5&>rrsv gradient descent 



( LpsnjDuSlsV <5/TSl/<95{S)5<9500 UlfiljD&l 

cgi/syf)<*0 ld. ^)<s 31 <sl} urDt$l£&nm perceptron-si> 

airoQpmb. 


Gradient Descent training (minibatches=l) 



su minibatches -®sr ld^Iul/ 
Q&nQdanjuLLQsbm LDrrSdrflp 
GTsmGvsfldGtnaujrra ^ss)LD^^nsv, <s>i§] stochastic 
gradient descent (Lp&npuSlev ^rj^a^d^u 











uuSlpdl ^sifld^ib. spsbQsijrrq^ 

9,tj<sijihsnn-j, (H&rrSd^gj gradient (Lpstn/ouSl<sv cost 

LD§blUG<S)U (gsm/Dggl GtirjrTLDsb, QLDrr£&,LDrT3i 

uudljbdl^ g,tjsija, g$) mu-jib srQbd.ffjd 
Qanrsm® (^snjorB^ cost-«p<® aismQ\i3u\.dai 
a_<ssiysn(S^ stochastic gradient descent 
CTS57(Ssu<s/r<53r Stp<s<s&sarz_ snss)[juui—^^isb, 

SpSuQsuiT(§ epoch-6JJ/LD ^3/<5Sgnj<STM_lU COSt LAZULI 

^Gnm^atfij uujijbdld, ^[jsijaySsxsmLjLb G^ijd,^ 

asmddh—uuQGijpmsv, ^ss)su zig-zag sut^siSai) 

^s<s)LDr 5 dbi(njuu 6 $)&>d airrsmsvrnb. l£!& 
cgi/syrsiy tSTsmststfldstnauSlsv uuSljbdl^ ^tjsijansb 

^K^d(^LbGurr^i, gradient descent ^<ss)su 
^s^ism^s^i^iL/Lb spsnQsnrrmrorTai ^umurB^i 
global optimum Q&<sirfD®ni—uj uSl^rbs, Grsrnb 
i3u^d(3)Lb. ^stnaiumsv, GunmrD 
£(fjjGmrhia<sr?l<sv stochastic-go l/ uujgstu®^ sc mb . 
£g)g/<3<sn batch gradient descent sissr^ib 

^ <ST> Lp d <95 U U ® dl/D^I ■ 



Stochastic Gradient Descent (minibatches=no. of training examples) 









25 . 


(LpU^<51]<SS)U 


^jiury^lysuLfld appsb (Lpigsustni—ryay 
si5]/_siS)si).<53isu. Deep Learning, AI, Neural 
Networks srmgy usvGsugy LigysmLDasrr aessflesfl 
ssvdlsv gi—g&y suqjj&}mr>m. ^supsng) 
^stnsmiupdjsv &)Gtu—d(§Lb urri—ihiasrr, 
susmsvuutylsyasrr, StackOverFlow Gurrsirg) 
getniiasir, arrQsmrTGiiflasrr suLplGiu Ggrrt—ijggy 
apgy sup Gsusmt^iSlGpsir. 




26 . &rrQ<omrr<sfi\a<5fT 


lSIsst 6U(ijLD YouTube Playlist so srssrgi 
a>/ tQ sOTr/Tsw) if, s?r >sad arrsmsomb 

https://www.voutube.com/watch? 

v=iHG8 We58H VY&list=PL 5itdT0 7Pm8wxRa 

PWTiPntnBmnOs4ExDM 

Ml-01 introduction to machine learning in 
tamil - ^jiurB^liT suL^ld arb/Dsv - sp(§ ^jSlfj^aab 
https://www.voutube.com/watch? 

v=iHG8We58HVY 


ML-02 Introduction to Machine Learning 
Algorithms in Tamil 








https://www.voutube.com/watch? 

v=AYMuT05i4gE 


ML-03 Pandas - 90 ^r$\(Lp#,LD - Introduction 
to Pandas in Tamil 
https://www.youtube.com/watch? 

v=lirK84iZv7g 

ML-04 Machine Learning Model Creation in 
Tamil 

https://www.voutube.com/watch? 

v=Nz 6 iJOZli-k 

ML-05 Machine Learning Model - Prediction - 
in Tamil 

https://www.youtube.com/watch? 

v=05HMDKepzRc 

ML-06 Feature Selection - Manipulated 
variable - Disturbance Variable 










https://www.voutube.com/watch? 

v=H85tTH HFMw 


ML-07 Feature Selection - Process Variable - 
RFE Technique - In Tamil 
https://www.voutube.com/watch? 

v=DyqlK24vlso 

ML 08 - Machine Learning in Tamil - 08 - 
Improving Model Score 
https://www.youtube.com/watch? 

v=6clvCfhI6qI 


ML 09 - Machine Learning in Tamil - Outliers 
Removal 

https://www.youtube.com/watch? 

v=SfBNynpsoyO 










ML 10 - Machine Learning in Tamil - 
Explanatory data Analysis 
https://www.youtube.com/watch?v=SliSuYJ- 

xiU 


ML 11 - Machine Learning in Tamil - Simple 
Linear Regression 
https://www.voutube.com/watch? 

v=OB36E9wlPI 

ML 12 - Machine Learning in Tamil - Gradient 
Descent 

https://www.youtube.com/watch? 

v= 3Cfw2gmOhI 

ML 13 - Machine Learning in Tamil - Multiple 
Linear Regression 
https://www.youtube.com/watch? 

v=ECK4bjIrWjw 










ML 14 - Machine Learning in Tamil - 
Polynomial Regression 
https://www.youtube.com/watch? 

v=8dJML0Xvzro 


ML 15 - Machine Learning in Tamil - Feature 
extraction using vectors 
https://www.youtube.com/watch7V" 

Xktzn9XxGg 

ML 16 - Machine Learning in Tamil - Natual 
Language ToolKit 
https://www.youtube.com/watch? 

v=vZLG5hOIvPM 


ML 17 - Machine Learning in Tamil - Logistic 
Regression 








https://www.voutube.com/watch? 

v=dXEnjS7Xjqs 


ML 18 - Machine Learning in Tamil - Multi 
class classification 
https://www.voutube.com/watch? 

v=R lXGhOlEoA 


ML 19 - Machine Learning in Tamil - Neural 
Networks 

https://www.youtube.com/watch? 

v=8pOBrh7bfqs 








27 . ^^iflujfiesrLSlp 

L£}<s$rG5)j[6V&<srr 


http ://f reetamilebooks. com/ 

authors/nithyaduraisamv/ 
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&<SS5fllULb UjDfSl 


£§)su«(35«fflfr 

&>lLl _ [DJD 35 <533$/J/zJ-LJ <5 |ji) (SOT GTSlflUJ SlS]S)^UJth]3ySb 

Qprri—iEi&l ^^Ij^iLLuLDrrssr tSULb&tmasb susop 
<3tjfilr5S>li— aS]ss)if>u-jLb ctsu 0<* *0 ld Q^snsuiunm 
&,aGi]Gba,G$)6fr Q{f>m _ ff&S&hun&>^ ^(^ld &,<stn£>rruj 

2-(fT)Qugl]GlJ& > J. 

• a_<53i/7, s^aS), ot<s3T usb6gjjt—& 

(SU<S31 <9535 <51$ gl/LD 615)619/7/513563161T <506193;/. 

• £|)<953^631/Dli5)63T/#35Lp6iy3563161T 
CT(/j)<53^631/7L9L/3^. 



• CT6170LD uihiasrflda tsprgjsuniu 

iuiTGiJ(fjjd(§LDm5tsr Q^rfiluSlsb aSlGurjimaGsxsn 

snLf>fh](^sn§]. 

• <_g> jd&T SUL^SiShspILD. L-l£&>3irh]3i<SfTrT3>GlJLb, 

suL-QdansnnaiGijLb sfilsiJijrEiastsvsrr 

Qsus/ihiSlQGijgj. 

ut&iaetflda 

• <^Qjjuu(Lpmm <5TGii(fjjLb urhiasrfldaisvrTLD. 

• &>lLi _ JDJD as^sflj^iLiuLb &rrrji 5 <s si5)s tyiuLDtra 

^)0<5<5si) (SsiismQLb. 

• urhi&>(oft\d&>d) QprTi—iEi^Lb (Lpmmij 
siswsfliu^^l/r)0 s-iiianQfffGni—Uj 
uS^UL-lfilLD^GS)#, tSySlflda 
sr^iijurTijd3yUuQ\S\r§^aim. 



editor@kaniyam.com 


0Slojijrij<5(5m_rEj<^luj LDLQcorTdyTisorD 
2_p<$Qmny51iLiniJj 

ILHT0LD Q^ITl_[EJ<Sa)m_b. 

• LDi—isS}eirQurT(r^eh:u^luLiiflLDLD 

cgi/syfluL/ 

• LDi—so s^mefTL-daLD 

• tormmrreb aGssfhug&il/rjarra 
^issuuuuuCRild 
u <sn z_ u l/ « m ssrd,§j ld 

asvsfhupGdiDarTa 
(Lp^sir(Lp^<svnriLi 



U <sn Z_ <95 <5 /735 

s-pi^liusiflddlCSpsm. 

• <g£i&,mQunqi)L-($\ 

(oTGXrd ffl 0 <95 <95 <95 <Sl tij. UJ 
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Kaniyam Foundation 

Account Number: 606 1010 100 502 
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Union Bank Of India 
West Tambaram, Chennai 



IFSC - UBIN0560618 


Account Type : Current Account 
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BHIM UPI Payments Accepted at 

Kaniyam Foundation 



Account Number: 606101010050279. IFSC Code: UBIN0560618 
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